Open aoliver44 opened 2 months ago
Hello Andrew,
Have you successfully run this RAT
tool yet? I encountered an error related to the input. From what I understand, my FASTQ
headers must be present in the contigs. My contigs were generated by MEGAHIT
, and they only have names like >k127_10000
, without any FASTQ names. What do you think I should do?
Best regards,
Nhu
./CAT_pack reads --mode cr -c contigs.fa -1 R_1.fastq -2 R_2.fastq -d /path/to/20240422_CAT_nr/db/ -t /path/to/200422_CAT_nr/tax/ -o lala -p out.CAT.predicted_proteins.faa -a out.CAT.alignment.diamond --c2c out.CAT.contig2classification.txt
<frozen genericpath>:39: RuntimeWarning: bool is used as a file descriptor
# CAT_pack v6.0.
[2024-10-11 09:41:47] samtools found: samtools 1.21.
[2024-10-11 09:41:47] bwa found: Version: 0.7.18-r1243-dirty.
RAT is running. Mapping reads against assembly with bwa mem.
[2024-10-11 09:41:47] Running bwa mem for read mapping. File fastq.bwamem.sorted will be generated.Do not forget to cite bwa mem and samtools when using RAT in your publication!
[2024-10-11 09:41:47] Contigs fasta is already indexed.
[2024-10-11 09:41:47] Running bwa mem...
...
[main] Version: 0.7.18-r1243-dirty
[main] CMD: bwa mem -t 24
[main] Real time: 118.429 sec; CPU: 2701.163 sec
[2024-10-11 09:43:52] Sorting bam file...
[bam_sort_core] merging from 0 files and 24 in-memory blocks...
[2024-10-11 09:44:00] Read mapping done!
[2024-10-11 09:44:00] contig2classification file supplied. Processing contig classifications.
[2024-10-11 09:44:00] Loading file 20240422_CAT_nr/tax/nodes.dmp.
[2024-10-11 09:44:03] Processing mapping file(s).
[2024-10-11 09:44:32] Chosen mode: cr. Classifying unclassified contigs and unmapped reads with diamond if no classification file is supplied.
[2024-10-11 09:44:32] No unmapped2classification file supplied .Grabbing unmapped and unclassified sequences...
[2024-10-11 09:44:33] Contigs written! Appending forward reads...
Traceback (most recent call last):
File "CAT_pack/CAT_pack/./CAT_pack", line 101, in <module>
main()
~~~~^^
File "CAT_pack/CAT_pack/./CAT_pack", line 85, in main
reads.run()
~~~~~~~~~^^
File "CAT_pack/CAT_pack/reads.py", line 435, in run
make_unclassified_seq_fasta(reads_files[0], unmapped_reads['fw'],
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
uncl_unm_fasta, 'fastq', 'a','_1')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "CAT_pack/CAT_pack/reads.py", line 1260, in make_unclassified_seq_fasta
fasta_dict[seq]))
~~~~~~~~~~^^^^^
KeyError: 'FT100012261L1C013R00202177739'
Thank you for such a fantastic tool.
I have been trying to get RAT to work. Your documentation specifies "Currently, RAT supports single read files as well as paired-end read files. Interlaced read files are currently not supported."
reads --mode cr -c 5002.filtered.fasta -1 reads_to_map.fastq -d $DB -t $TAX --c2c 5002_CAT.contig2classification.txt
But that throws an error:
Im terrible at python...but in looking at reads.py, I could not see how the
-2
flag could be optional (ie in the case of single read files).For troubleshooting, I tried supplying an empty
-2 reads_to_map_2.fastq
file seemed to get it to run further. I think this is because the process looks like it appends the reads and unidentified contigs all in one big file, and the pairing doesn't really matter?