WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

Correct syntax for the program "DRAM.py merge_annotations" #290

Closed psampara closed 1 year ago

psampara commented 1 year ago

Hello, I am trying to run "DRAM.py merge_annotations" program on individually annotated bins, but the program is unable to recognize the contents of the directories. Could you help me understand the correct syntax to run the "merge_annotations" program on previously annotated individual bins?

I have my annotations in the directory "/08_dram/annotations" and each subdirectory in this folder is an individually annotated bin. An example path of the annotations for a bin in the parent directory is: "08_dram/annotations/annotation_bin.500"

I tried "merge" annotations in different ways -

  1. DRAM.py merge_annotations -i /08_dram/annotations/* -o /08_dram/merge_anno This code resulted in the following error:

usage: DRAM.py [-h] {annotate,annotate_genes,distill,strainer,neighborhoods,merge_annotations} ... DRAM.py: error: unrecognized arguments: annotations/annotation_bin.123/ annotations/annotation_bin.125/ ... and so on listing all directories in the "/08_dram/annotations" directory.

  1. I also tried to run without the wildcard "*" in the above code as such: DRAM.py merge_annotations -i /08_dram/annotations/ -o /08_dram/merge_anno But it resulted in the following error:

2023-08-03 10:45:44,074 - The log file is created at merge_anno/annotate.log 2023-08-03 10:45:44,075 - Skipping the fallowing directory because no 'annotations.tsv' file was found: /scratch/st-ziels-1/psampara/pao_sip/08_dram/annotations Traceback (most recent call last): File "/scratch/st-ziels-1/psampara/dram_env/bin/DRAM.py", line 207, in args.func(**args_dict) File "/scratch/st-ziels-1/psampara/dram_env/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1881, in merge_annotations_cmd merge_annotations(annotations_list, output_dir, write_annotations=True) File "/scratch/st-ziels-1/psampara/dram_env/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1775, in merge_annotations all_annotations = pd.concat( File "/scratch/st-ziels-1/psampara/dram_env/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 372, in concat op = _Concatenator( File "/scratch/st-ziels-1/psampara/dram_env/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 429, in init raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

It makes sense, since the directory was not exactly provided, so a search for the files within the annotation directories was not done.

  1. So, I thought it might be the issue with finding individual files and copied all data within the individual parent annotation directories to a new directory. For example, I copied all files from the folder: "08_dram/annotations/annotation_bin.123" to a new folder "08_dram/annotations/all_files" with the parent directory name appended to the file name as such - "08_dram/annotations/all_files/bin.123_annotations.tsv". The "08_dram/annotations/all_files" folder now contains all annotations files for all bins.

Then I ran the program in this directory as such: DRAM.py merge_annotations -i 08_dram/annotations/all_files/* -o 08_dram/merge_anno

However, it also generated an error as following:

usage: DRAM.py [-h] {annotate,annotate_genes,distill,strainer,neighborhoods,merge_annotations} ... DRAM.py: error: unrecognized arguments: bin.112_annotate.log bin.112_annotations.tsv ... and so on, listing every file in the directory.

What is the correct syntax to run the "merge_annotations" program?

Thanks! Pranav

psampara commented 1 year ago

I figured out that I need to provide the input directory with quotes as such:

DRAM.py merge_annotations -i 'annotations/*' -o merge_anno