asrivathsan / ONTbarcoder

25 stars 3 forks source link

Issue running on Windows OS #3

Closed MaestSi closed 3 years ago

MaestSi commented 3 years ago

Hi, I downloaded ONTBarcoder_0.1.8_exe.win-amd64.zip, unzipped and started the executable ONTbarcoder. The GUI starts and I am able to select input files. However, just after starting the analysis, the program crashes and quits. I tested both with own already demultiplexed data and with Flongle dataset (Mixed Diptera). Do you know how could I solve the issue? I am running on an ASUS laptop with 16GB RAM, i7 processor and Windows 10 PRO OS. Thanks in advance, Simone

asrivathsan commented 3 years ago

Hi Simone,

One of the issues that I have observed in Windows is the total path length, which is limited in the OS. Would you mind trying by moving the software up in the folder hierarchy so that the total path length doesn't hit 260 characters?

If this is not resolving the problem, could you run the software in cmd and paste the error message? Thank you Amrita

MaestSi commented 3 years ago

Hi, I tried moving the software folder in "C:\", but still the same issue is occurring. This is the error I got when running in cmd.

C:\ONTBarcoder_0.1.8_exe.win-amd64>ONTbarcoder.exe
QLayout: Attempting to add QLayout "" to OptWindow "", which already has a layout
/C:/Users/simon/Desktop/Brunei_2018
[True, True, True, True, True, True, True, True, True, True]
[25, 50, 100, 200, 500]
Traceback (most recent call last):
  File "ONTbarcoder.py", line 1305, in run
  File "ONTbarcoder.py", line 1275, in subset_bylength
IndexError: list index out of range
QObject::~QObject: Timers cannot be stopped from another thread
QWaitCondition: Destroyed while threads are still waiting

Simone

asrivathsan commented 3 years ago

Thanks, the error seems to be occurring at the point when reads are being subset to a certain length criteria and it is not finding a line after ">" in fasta. Can I check once if same error is in the mixed diptera dataset?

MaestSi commented 3 years ago

Here is the error with the Mixed Diptera file:

C:\ONTBarcoder_0.1.8_exe.win-amd64>ONTbarcoder.exe
QLayout: Attempting to add QLayout "" to OptWindow "", which already has a layout
/C:/Users/simon/Desktop/DatasetA_Flongle_MixedDipteraSubsample_small-20210524T072242Z-001/DatasetA_Flongle_MixedDipteraSubsample_small
/C:/Users/simon/Desktop/MixedDipteraSubsample_demfile
[True, True, True, True, True, True, True, True, True, True]
[25, 50, 100, 200, 500]
C:\ONTBarcoder_0.1.8_exe.win-amd64/lib\Bio\Seq.py:2715: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
Traceback (most recent call last):
  File "ONTbarcoder.py", line 1305, in run
  File "ONTbarcoder.py", line 1275, in subset_bylength
IndexError: list index out of range
QObject::~QObject: Timers cannot be stopped from another thread
QWaitCondition: Destroyed while threads are still waiting

However, I noticed that after uploading the FASTQ file, the software GUI highlights MODE2 and writes "Input is demultiplexed reads". Then, if I upload the demultiplexing file, the software GUI highlights MODE1 and writes "You have dragged a demultiplexing file, but a FASTQ file remains". If I reupload also the FASTQ file and start the analysis, it crashes as before. P.s.: is the software supposed to perform a FASTQ to FASTA conversion? Otherwise I don't understand why it is looking for ">". Thanks, Simone

asrivathsan commented 3 years ago

With regard to FASTQ vs FASTA: Demultiplexed reads expected to be in FASTA format, in a specific folder. On the other hand, for demultiplexing, an input FASTQ is expected. We will be uploading a version which takes in FASTA at this stage as well, but currently we support FASTQ for demultiplexing. Therefore MODE1 works with FASTQ+demultiplexing file. MODE2 works with folder containing fasta files. If the folder contained FASTQ files, then this error could happen, as this may be misreading the quality score

It still isn't clear to me why Mixed Diptera is giving the issue though, because this has run smoothly so far. If you could specifiy the full path of output folder, and the current contents of it, I will try to troubleshoot this

asrivathsan commented 3 years ago

One thing I noticed is that the first file dropped is named /C:/Users/simon/Desktop/DatasetA_Flongle_MixedDipteraSubsample_small-20210524T072242Z-001/DatasetA_Flongle_MixedDipteraSubsample_small

This doesn't seemed to be a fastq file.

One can refer to this video for dragging in to enable MODE1. https://www.youtube.com/watch?v=mK5p9oAHHqs&t=32s

MaestSi commented 3 years ago

Hi Amrita, you are right, I mistakenly uploaded the folder instead of the fastq file, my fault. Still, on my dataset the analysis is failing also in case I upload FASTA files, and this is the error.

C:\ONTBarcoder_0.1.8_exe.win-amd64>ONTbarcoder.exe
QLayout: Attempting to add QLayout "" to OptWindow "", which already has a layout
/C:/Users/simon/Desktop/Brunei_2018
[True, True, True, True, True, True, True, True, True, True]
[25, 50, 100, 200, 500]
C:\ONTBarcoder_0.1.8_exe.win-amd64/lib\Bio\Seq.py:2715: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BC01
BC03
BC02
BC05
BC04
BC07
BC06
[25, 50, 100, 200, 500]
Traceback (most recent call last):
  File "ONTbarcoder.py", line 1305, in run
  File "ONTbarcoder.py", line 1267, in subset_bylength
IOError: [Errno 2] No such file or directory: 'C:/Users/simon/Desktop/Brunei_2018\\BC01_all.fa'
QObject::~QObject: Timers cannot be stopped from another thread
QWaitCondition: Destroyed while threads are still waiting

Simone

asrivathsan commented 3 years ago

Ah, this is more traceable. I will upload a version soon, the demultiplexed files from our pipeline add "_all.fa" (added it because of parallelizing), and the downstream expects this. Will modify it for more flexible input file name users interested in supplying externally demultiplexed file.

MaestSi commented 3 years ago

Thanks. In the meantime I renamed the fasta files and the analysis completed successfully, apparently. However, I noticed the consensus sequences were all in file Remaining.fa (while Allbarcodes.fa was empty), and alignment identities to corresponding Sanger sequences were very poor. Probably I should tune some parameters to adjust for R9.4 sequencing chemistry and for slightly larger amplicon length (on average, but each sample is quite different), or do you suspect something went wrong with the analysis? In the following, the alignment identity obtained blasting each consensus sequence to its Sanger is reported.

results_BC06.txt
 Identities = 575/697 (82%), Gaps = 121/697 (17%)
***************************
results_BC03.txt
 Identities = 468/575 (81%), Gaps = 105/575 (18%)
 Identities = 454/565 (80%), Gaps = 106/565 (19%)
***************************
results_BC04.txt
No alignment found
***************************
results_BC07.txt
 Identities = 352/445 (79%), Gaps = 85/445 (19%)
***************************
results_BC01.txt
 Identities = 295/363 (81%), Gaps = 68/363 (19%)
 Identities = 197/235 (84%), Gaps = 37/235 (16%)
 Identities = 212/268 (79%), Gaps = 51/268 (19%)
***************************
results_BC02.txt
 Identities = 493/641 (77%), Gaps = 143/641 (22%)
 Identities = 365/472 (77%), Gaps = 104/472 (22%)

Thanks, Simone

asrivathsan commented 3 years ago

Hi Simone,

Are your sequences post primer removal and retaining only those where primers were found? I am wondering if this was done by using ONT's custom kit, because by default primers wouldn't be excluded, I think. This leads to 2 considerations: 1) Primers, if having ambiguities will cause alignment problems, since this is based on MSA 2) The software is tuned to selecting based on lengths of sequences which are lengths post primer removal. Is this being accounted for in the settings menu? else you may be selecting the messier reads. If primers don't have ambiguities, the length etc should include the primer length and the translation etc checks wouldn't directly work, because the primer sequence would be included.

The software runs fine with R9.4 andHAC data here. We have transitioned to that in the last year, but I agree with older data and fast basecalls settings may have to be tuned.

MaestSi commented 3 years ago

No, these are reads generated in 2018 in Brunei using ONT EXP-PBC001 multiplexing kit, that were re-basecalled with Guppy v3.6 high-accuracy, demultiplexed and length filtered, just to remove sequences much shorter or longer than expected. So, adapters were trimmed, but PCR primers were not. I usually remove PCR primers in the consensus sequence (not in the reads) because Nanopolish, which I usually use in my pipeline, needs some flanking sequences. Moreover in this run I had 2 different primer pairs, but I understand this may cause an issue with ONTbarcoder. So thanks for the troubleshooting, I am going to close the issue. Simone