Species_num error - Githubissues

wrightaprilm commented 2 months ago

Hi all,

I've been trying to work with MiFish with a custom amplicon reference database built using makeblastdb, and a results file in .fasta format.

I've been using the command:

mifish seq/ database/crabtest.fasta

All the dependencies are found et al. But I get this error:

Detect your data as
#########
    zip warning: name not matched: ./MiFishResult/Sample-*/01_filter_fastq_and_merge/*.html

zip error: Nothing to do! (./MiFishResult/QC.zip)
Traceback (most recent call last):
  File "/home/labaccount/miniconda3/envs/MiFish/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/home/labaccount/projects/mifish_test/MiFish/mifish/cmd/mifish.py", line 82, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/home/labaccount/projects/mifish_test/MiFish/mifish/core/pipeline.py", line 395, in runMiFish
    if simple_result == False and 'species_num' in stat_data and stat_data['species_num'] > 3:
UnboundLocalError: local variable 'stat_data' referenced before assignment

I'm not sure how to interpret this. It looks like potentially there are few matches in the amplicon database?

billzt commented 2 months ago

Hi @wrightaprilm, please add the -s option (Skip downstream analysis) and -k option (Keep temporary files), then try to list the files under ./MiFishResult/ here. I am happy if you can share the ./MiFishResult/ directory (here or through third-party net disk)

wrightaprilm commented 1 month ago

Sorry for my late reply. Directory attached. MiFishResult.zip

Basically all the files are empty, so I'm wondering if one of both of my inputs are wrong. Those are attached as well. I've been using the 'all.fasta' database, in case this was an issue of the sequences not belonging to one of the smaller taxonomic groups.

database.zip sequences.zip

billzt commented 1 month ago

@wrightaprilm Thank you for your data. In the archive sequences.zip, I only found one file S12-1-3_R2_001.fastq, and I think there should exist another file S12-1-3_R1_001.fastq. They have to be used together. Would you be pleased to share it ?

camrynbigelow commented 1 month ago

Hello, this is @wrightaprilm 's student. We added the second file with our forward and reverse primers, and it worked. However, it tells me that it "did not pass read length filter". I tried using the command -m to lower the minimum read length to 50, then 1 just to check, but it gave me the same result. Is there any other way I can change the filter parameters? Thank you!

(MiFish) labaccount@system76-pc:~/projects/mifish_test/MiFish$ mifish -s -k ../sequences/ ../database/all.fasta -f GGWACWGGWTGAACWGTWTAYCCYCC -r TAIACYTCIGGRTGICCRAARAAYCA Warning: Directory ./MiFishResult has already existed. All the files within it would be deleted Detect your data as ######### Group1: 1 samples Sample S12_13: read type = se ######### Sample S12_13 Step 0: Decompress Sample S12_13 Step 1: filter the quality of FASTQ and merge Pair-End Reads Sample S12_13 Step 2: filter read length and remove primers Sample S12_13 has not passed read length filter. Only has 0 reads. Skip adding: S12_13.html (deflated 79%)

billzt commented 1 month ago

Hi @camrynbigelow , Could you share the second file S12-1-3_R1_001.fastq ?

camrynbigelow commented 1 month ago

Yes, the second file is attached. S12-1-3_R1_001.fastq.gz

billzt commented 1 month ago

@camrynbigelow Got it. Please put S12-1-3_R1_001.fastq.gz and S12-1-3_R2_001.fastq.gz together in the sequences directory before running.

Besides, the main problem is that it failed to merge R1 and R2. Could you confirm that reads from R1 and R2 overlap with each other? In the log file MiFishResult/Sample-S12_1_3_/01_filter_fastq_and_merge/S12_1_3_.flash.log from my running, it said almost no reads merged successfully.

[FLASH] Read combination statistics:
[FLASH]     Total reads:      415156
[FLASH]     Combined reads:   718
[FLASH]     Uncombined reads: 414438
[FLASH]     Percent combined: 0.17%

I checked the first read pair, R1 is:

@A01940:372:GW240426000:4:1101:1497:1266 1:N:0:ACCCAGCA+TAAGATTA
AGGCTTGGAACAGGTTGAACAGTTTACCCTCCTTTAAGCAATTTGTCAGGCCATCCTGGCGCTGCTGTTGATATGGCCATATTTAGCCTGCACCTTGCAGGTATGTCCTCTATTTTAGGGGCAATTAATATGATTGTGACTATATTTAAC
+

and R2 is

@A01940:372:GW240426000:4:1101:1497:1266 2:N:0:ACCCAGCA+TAAGATTA
GCCTCCTAGACTTCGGGATGGCCGAAAAACCAAAATAAATGCTGAAATAGAATAGGGTCGCCGCCCCCGTCGGGTTTAAAAAAGTGCGTGCCAAAATTCCTGTCAGTAAGTAGCATAGTAATTGCTCCGGCTAGTACTGGCAAAGCTAGT
+

I found that R1 matches region 329156~329293 (minus strand) of CP104166.1, and R2 matches region 328935~329078 (plus strand) of CP104166.1, which means they do not overlap.

billzt / MiFish

Species_num error #13