donovan-h-parks / RefineM

A toolbox for improving metagenome-assembled genomes.
GNU General Public License v3.0
63 stars 9 forks source link

problem with taxon_profile databases #12

Closed zyyalice closed 6 years ago

zyyalice commented 6 years ago

Hi Donovan, I am trying to use RefineM on my Server. It worked fine until the process "Removing contamination based on taxonomic assignments", When I reach the process ">refinem taxon_profile -c 40 /scaffold_stats.tsv ", I don't know what " and " refers to. What database should i use?

Could you give an answer and help me to continue my process?

Thanks a lot.

Looking forward to your reply.

Regards, Alice

donovan-h-parks commented 6 years ago

Hey Alice. I've updated the README file for RefineM to point to a reference database and taxonomy file that can be used. Please let me know if you run into any problems.

zyyalice commented 6 years ago

Hi Donovan,

Thank you for your reply.

After using the databse as you provided, it did not solve the problem

when I went running "refinem taxon_profile -c 40 gene_output_dir stats_output_dir_2/scaffold_stats.tsv genome_db.faa.dmnd gtdb_r80_taxonomy.2017-11-09.tsv taxon_profile_output_dir" or "refinem taxon_profile -c 40 gene_output_dir stats_output_dir_2/scaffold_stats.tsv genome_db.2017-11-09.genes.faa gtdb_r80_taxonomy.2017-11-09.tsv taxon_profile_output_dir" or "refinem taxon_profile -c 40 gene_output_dir stats_output_dir_2/scaffold_stats.tsv Archaea_bacteria_nrRefseq_prot.total.faa.dmnd gtdb_r80_taxonomy.2017-11-09.tsv taxon_profile_output_dir"

The output messages are the same as below:

[2017-12-14 14:37:25] INFO: RefineM v0.0.20 [2017-12-14 14:37:25] INFO: refinem taxon_profile -c 40 gene_output_dir stats_output_dir_2/scaffold_stats.tsv genome_db.faa.dmnd gtdb_r80_taxonomy.2017-11-09.tsv taxon_profile_output_dir [2017-12-14 14:37:25] INFO: Reading scaffold statistics. [2017-12-14 14:37:43] INFO: Appending genome identifiers to all gene identifiers. [2017-12-14 14:37:54] INFO: Reading taxonomic assignment of reference genomes. [2017-12-14 14:37:59] INFO: Running diamond blastp with 40 processes (be patient!) [2017-12-14 14:37:59] INFO: Creating taxonomic profile for each genome. Unexpected error: <type 'exceptions.ValueError'> Traceback (most recent call last): File "/usr/local/bin/refinem", line 409, in parser.parse_options(args) File "/usr/local/lib/python2.7/dist-packages/refinem/main.py", line 683, in parse_options self.taxon_profile(options) File "/usr/local/lib/python2.7/dist-packages/refinem/main.py", line 231, in taxon_profile options.tmpdir) File "/usr/local/lib/python2.7/dist-packages/refinem/taxon_profile.py", line 501, in run self.taxonomic_profiles(diamond_table_out, taxonomy) File "/usr/local/lib/python2.7/dist-packages/refinem/taxon_profile.py", line 121, in taxonomic_profiles subject_genome_id, subject_gene_id = hit.subject_id.split('~') ValueError: need more than 1 value to unpack

Could you give me some suggestions about how to fix this?

Thanks a lot.

Looking forward to your reply.

Regards, Alice

donovan-h-parks commented 6 years ago

Hello Alice. I'm not sure what the issue is. I have verified that these databases work on my system. As such, I believe it is something specific to your genomes. If you can put all the input data your are providing to "refinem taxon_profile" somewhere for me, I would be happy to dig into the issue.

zyyalice commented 6 years ago

Hi Donovan,

Thank you for your reply.

When i use the command "refinem call_genes -c 40 bins " The output message is " error: unrecognized arguments: gene_output_dir" So i delete the "bins" in the command, i runned "refinem call_genes -c 40 " and then it worked. Is it because i changed the command and thus lead to the problem in the following process?

Could you give me some suggestions?

Thanks a lot.

Looking forward to your reply.

Regards, Alice

donovan-h-parks commented 6 years ago

Hello Alice.

The error message when you ran ""refinem call_genes -c 40 bins " simply indicates you provided too many parameters. The "call_genes" command only expects a directory containing the bins and an output directory. Please note that the "<>" indicate that you should input what is appropriate for your data, e.g.: "refinem call_genes my_bins my_output_directory".

donovan-h-parks commented 6 years ago

Closed due to inactivity.