WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

DRAM hanging on Merging ORF annotations step #232

Closed Rridley7 closed 1 year ago

Rridley7 commented 1 year ago

Hi, Thank you for your work on this tool! I am having an issue where DRAM is freezing on the Merging ORF annotations step, and does not continue the workflow following this. I am running DRAM in a snakemake environment on a single sample. The error log file from the run is attached. dram_run.txt

rmFlynn commented 1 year ago

Can you edit the time limit in slurm? Give it a long time slurmstepd: error: *** JOB 42095 ON atl1-1-02-003-21-2 CANCELLED AT 2022-11-06T00:23:00 DUE TO TIME LIMIT ***

Rridley7 commented 1 year ago

I can give it a longer time - I gave it 24 hrs, however over 20 hrs was spent in the merging ORF annotations step. Is this to be expected?

rmFlynn commented 1 year ago

Not usually, but it is possible depending on the file system and what else is running. How many times did it do this?

Rridley7 commented 1 year ago

I have attempted it about 5 or 6 times, each time having the same result. I did also have it run in a much shorter time when running it on the already predicted genes using annotate-genes rather than the contigs file with annotate, however I would like the additional information provided by the other annotations.

rmFlynn commented 1 year ago

You want other genes but not the Gen Bank files? I'll have to double check but sometimes GenBank can cause things to hang. The generation of those files is for some reason an exponential process. If annotate genes is running great then maybe that's it. You could call all your gene with prodigal and then combine them. Then again maybe it's time I made a skip genbank option.

Rridley7 commented 1 year ago

I'm not super concerned about the Genbank files, more just the annotation information in any format. From what I understand however, prodigal does not call the trna or rrna information either, which would be useful for me. This would only be available through running the entire pipeline, correct?

Somewhat a side question but also related, is it possible to include input from another annotation source, say genes we have already called with uniprot90, and directly add them in to the dram annotate and distill functions?

rmFlynn commented 1 year ago

To part one, the DRAM paper does document the way we call rRNA and tRNA, so you could recreate that part more or less quickly and just drop the files in with the correct names. DRAM just looks for correctly formatted files with correct names so you can mess with the files a lot. I may make a separate tool in the coming weeks, but I don't have time at the moment. You would be better served waiting for the gen bank process to finish. This is not ideal, but there is no time for a rewrite at the moment.

It is possible you would just need to match the format of the annotations file and add it on. I would use caution with the results, however, keep in mind that uniref plays second Fidel to KOfam. This would fall into the categories of hacking the program in the softest sense.