Open jkomyno opened 3 years ago
Hi @jkomyno , it seems that the error lies in line 5, so could you check what does line 5 look like?
I've added the simulated fastq file here (I'm sorry, I thought I had already linked it in the original issue, but I forgot).
Line 5 is the following:
@ENA|U00096|U00096_2138149;aligned_4_R_13_2748_29
It looks like a normal header generated by NanoSim. My intuition is that the ;
is causing the problem. I quickly checked fqtools
manual and it seems you can specify which character is expected. So if ;
is not in the default list, the header is considered invalid. That being said, I'm not entirely sure what went wrong. And since I'm busy with my thesis these days, could you help try that and let me know how it works? Thanks!
Hi, I ran fqtools -p ';' validate ./data/simulated/simulated_aligned_reads.fastq
, but I get the same error.
I thought you said there was no error with unaligned reads before?
That was a typo, sorry. I edited the comment so it's clearer.
Hi @cheny19, any update?
Hi @jkomyno, sorry for no update recently. I don't know much about the validity criteria about fqtools
. Based on your comment in isONclust
, it seems that the tool didn't read the quality score properly.
@theottlo, do you have any thoughts about this?
Hi @jkomyno, I apologize for the delay! I was wondering which version of NanoSim you were using to simulate the reads. It looks like the sequence and quality score lengths are different in the aligned fastq file, which is a known bug in NanoSim v2.6.0 and is fixed in the v3.0.0 pre-release.
Hi @theottlo, I believe you have access to the fastq file. I have cloned the NanoSim repository some days after v3.0.0 was released.
Hi @jkomyno,
Sorry for the late reply. I finally got time to install fqtools now. I repeated your simulation command but with the pre-trained human DNA dataset models as input. I couldn't re-produce the error unfortunately. The validate results are OK for both aligned reads and unaligned reads. Could you make sure you are using the latest commit and try simulating with that pre-trained model again and see how it goes?
Cheers, Chen
Hi, I've characterized and later simulated 20000 reads from the E. Coli genome. It seems that the simulated_aligned_reads.fastq file generated in the simulation phase isn't a valid fastq file, according to fqtools's
validate
command.The characterization phase command is:
The simulation phase command is:
fqtools command and validation error:
On the other hand, unaligned reads are ok: