Open florenmartino opened 5 days ago
Hi @florenmartino,
For your genome list file, here's the description from the help page:
-gl GENOME_LIST, --genome_list GENOME_LIST
Reference metagenome list, tsv file, the first column
is species/strain name, the second column is the
reference genome fasta/fastq file directory, the third
column is optional, if provided, it contains the
expected abundance (sum up to 100)
From the formatting you pasted above, it looks like you may have space-separated each line instead of using tabs? That's worth double-checking first. At minimum you need the first two columns that are described.
In terms of abundance, it is optional for the characterization stage, but if you want to quantify, you can use the read_analysis.py quantify
mode - there is more description of that one on the README.md.
Hope that helps - thank you for your interest in NanoSim! Lauren
Hi there! Im new using this tool, so sorry in advance if I'm asking something silly.
I'm trying to use the
read_analysis.py
script for metagenome analysis, but I'm encountering issues with the genome_list input. Here's what I’ve done so far:My genome_list.tsv file is structured like this:
Identifier FilePath AB008394_Torque_teno_virus_1 References.split/AB008394.fasta AB017613_Torque_teno_virus_16 References.split/AB017613.fasta AB025946_Torque_teno_virus_19 References.split/AB025946.fasta AB026929_Torque_teno_mini_virus_6 References.split/AB026929.fasta AB026931_Torque_teno_mini_virus_1 References.split/AB026931.fasta AB028668_Torque_teno_virus_15 References.split/AB028668.fasta AB037926_Torque_teno_virus_14 References.split/AB037926.fasta AB038621_Torque_teno_virus_29 References.split/AB038621.fasta AB038627_Torque_teno_mini_virus_7 References.split/AB038627.fasta AB038629_Torque_teno_mini_virus_2 References.split/AB038629.fasta AB038630_Torque_teno_mini_virus_3 References.split/AB038630.fasta AB038631_Torque_teno_mini_virus_9 References.split/AB038631.fasta AB041957_Torque_teno_virus_4 References.split/AB041957.fasta AB041958_Torque_teno_virus_26 References.split/AB041958.fasta AB041959_Torque_teno_virus_25 References.split/AB041959.fasta AB041960_Torque_teno_tamarin_virus References.split/AB041960.fasta ...
I ran the command:
read_analysis.py metagenome -i /path/to/myfile.fastq.gz -gl genome_list.tsv --no_model_fit -o nanosim_output -t 16
The script failed with the following error:
(nanosim) [fmarti34@login02 NANOSIM-TEST]$ read_analysis.py metagenome -i /home/fmarti34/data_sclipma1/Anellome_outputs_hash/AS1_12_mo./AS1_12_mo..fastq.gz -gl genome_list.tsv --no_model_fit -o nanosim_1_test -t 16
Running the code with following parameters:
infile /home/fmarti34/data_sclipma1/Anellome_outputs_hash/AS1_12_mo./AS1_12_mo..fastq.gz genome_list genome_list.tsv g_alnm prefix nanosim_1_test num_threads 16 model_fit False chimeric False homopolymer False fastq False quantification False 2024-11-21 10:29:43: Read pre-process 2024-11-21 10:31:32: Processing reference genome Traceback (most recent call last): File "/home/fmarti34/.conda/envs/nanosim/bin/read_analysis.py", line 879, in
main()
File "/home/fmarti34/.conda/envs/nanosim/bin/read_analysis.py", line 675, in main
metagenome_list[species] = {'path': info[1]}
Questions:
Thank you in advance!
Best regards,
Flor