geronimp / graftM

GraftM - Rapid community profiles from metagenomes
http://geronimp.github.io/graftM/
GNU General Public License v3.0
44 stars 16 forks source link

GraftM graft on big metagenome error #277

Open steff1088 opened 2 years ago

steff1088 commented 2 years ago

Hi all,

I ran into issues running my mcrA package on a big 45 GB metagenome in fastq format. I can't really interpret the error message so I was wondering if you had any ideas. The package runs fine on other metagenomes in fasta and fastq format. @wwood @geronimp

GraftM 0.13.1

                            GRAFT

                   Joel Boyd, Ben Woodcroft

                                                     __/__
                                              ______|
      _- - _                         ________|      |_____/
       - -            -             |        |____/_
       - _     >>>>  -   >>>>   ____|
      - _-  -         -             |      ______
         - _                        |_____|
       -                                  |______

04/23/2022 01:38:19 PM INFO: Working on 11774.2.218915.CGAACTG-ACAGTTC.filter-METAGENOME Traceback (most recent call last): File "/home/users/sbuessec/.local/bin/graftM", line 415, in Run(args).main() File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/run.py", line 613, in main self.graft() File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/run.py", line 388, in graft diamond_db File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/timeit.py", line 10, in timed result = method(*args, **kw) File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 851, in aa_db_search hit_reads_orfs_fasta) File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 943, in search_and_extract_orfs_matching_protein_database hits File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 534, in _extract_from_raw_reads extern.run(extract_cmd, stdin='\n'.join(input_reads)) File "/home/users/sbuessec/.local/lib/python3.6/site-packages/extern/init.py", line 41, in run raise ExternCalledProcessError(process, command) extern.ExternCalledProcessError: Command mfqe --output-uncompressed --fasta-read-name-lists /dev/stdin --input-fasta <(awk '{print ">" substr($0,2);getline;print;getline;getline}' '11774.2.218915.CGAACTG-ACAGTTC.filter-METAGENOME.fastq') --output-fasta-files '/tmp/_raw_extracted_reads.famb1zbzrb' returned non-zero exit status 101. STDERR was: b"[2022-04-23T20:45:46Z INFO mfqe] Read in 223 read names from /dev/stdin\n[2022-04-23T20:45:46Z INFO mfqe] Iterating input FASTQ file\n[2022-04-23T20:47:38Z INFO mfqe] Extracted 446 reads from 120829412 total\nthread 'main' panicked at 'Mismatching numbers of read names were observed. Expected:\n[223]\nbut found\n[446]', src/main.rs:333:9\nnote: run with RUST_BACKTRACE=1 environment variable to display a backtrace\n"STDOUT was: b''

wwood commented 2 years ago

Hi,

I can't tell exactly since I don't have the command you used or the data, but the error message (found 446 reads when expected 223) suggests to me that the read sets are interleaved, since 223*2=446.

Does that help?

steff1088 commented 2 years ago

Thank you very much for the quick response.

The command I used was: _graftM graft --threads 8 --evalue 0.000000001 --forward 11774.2.218915.CGAACTG-ACAGTTC.filter-METAGENOME.fastq --graftm_package 500PSI_mcrAs_refined.gpkg --output_directory GraftM_output_11774.2.218915.CGAACTG-ACAGTTC_500PSI_mcrAs_refinedpackage --force

If the reads are interleaved, what can I do to make them compatible with the graft command?

wwood commented 2 years ago

Hi,

You can either use the --interleaved flag instead of --forward. You can tell whether they are interleaved easily just by looking at the head of the file - they'll have 2 reads with the same name. Alternatively you can split the file up - there's plenty of tools out there for doing that out there. ben Ben WoodcroftMicrobial informatics group leader, ARC Future Fellow (+617) 3443 7334 Centre for Microbiome Research, Level 3, Translational Research Institute, School of Biomedical Sciences, Faculty of Health, Queensland University of Technology https://research.qut.edu.au/cmr/team/ben-woodcroft

On Apr 27 2022, at 11:26 am, steff1088 @.***> wrote:

Thank you very much for the quick response. The command I used was: graftM graft --threads 8 --evalue 0.000000001 --forward 11774.2.218915.CGAACTG-ACAGTTC.filter-METAGENOME.fastq --graftm_package 500PSI_mcrAs_refined.gpkg --output_directory GraftM_output_11774.2.218915.CGAACTG-ACAGTTC_500PSI_mcrAs_refined_package --force

If the reads are interleaved, what can I do to make them compatible with the graft command? — Reply to this email directly, view it on GitHub (https://github.com/geronimp/graftM/issues/277#issuecomment-1110432361), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAADX5HD7CIFV4BJ6O7JZHLVHCJTPANCNFSM5UMM6IXA). You are receiving this because you were mentioned.

steff1088 commented 2 years ago

Thanks Ben, that did the trick!

-steffen