LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
167 stars 31 forks source link

readfish 0.0.11a4 on MinION sequencing toml requires break_reads_after = 1.0 #234

Closed rainwala closed 11 months ago

rainwala commented 1 year ago

We seem to have found a situation that requires break_reads_after = 1.0 rather than break_reads_after = 0.4 in the sequencing toml file.

We have just run readfish version 0.0.11a4 from dev_staging on a MinION MK1B with the following software versions: MinKNOW 22.12.7 MinKNOW core 5.4.7 Guppy 6.4.6 Bream 7.4.8 Script Configuration 5.4.7

Our system and GPU is: Ubuntu 18.04 LTS NVIDIA GeForce RTX 2080

with script "sequencing_MIN114_DNA_e8_2_400T_readfish.toml" and the parameter: break_reads_after = 0.4

We get very poor quality scores, and slow unblocking, as shown here: image image

When we change the break_reads_after parameter value to 1.0: break_reads_after = 1.0

The quality scores get much better, and the unblock lengths settle down to ~500bp: image Screenshot_2023-03-31_17-10-43

This makes for a run that seems like it will work, as opposed to the situation with break_reads_after = 0.4

I just wanted to flag this issue.

samhorsfield96 commented 1 year ago

I have also just encountered the issue with low read quality using V14 chemistry (sequencing_MIN114_DNA_e8_2_400T.toml) using readfish 0.0.10dev2, however the unblock read lengths looked fine. Did the enrichment results look as expected using this fix?

rainwala commented 1 year ago

I'm about to test the enrichment over the next few days. Will post an update after that.

rainwala commented 1 year ago

OK, so with this new run even setting break_reads_after = 1.0 didn't fix the issue. We had poor quality and unblocked read lengths were well over 1kb... Maybe we need a new GPU...

mattloose commented 1 year ago

OK - whilst I would still advise one second for running at present, I am not convinced that the quality issue here is related to the break reads setting. The quality that @rainwala is showing above is lower than I would expect. As an example, here is a run we currently have on promethION which is running adaptive with break reads at 1 second using kit14:

image (This is the modal quality score).

You can see the unblock peak is fine -

image

And the unblock read length N50 is around 900 bases.

We are unblocking a lot of the data here which is why you can't see any on target reads at all.

If I zoom in on the approrpriate lengths we see this:

image

Having said all the above, I do note that I see some strange differences in the sequenced reads when measuring read quality. I've not noticed specifically that this impacts enrichment and I would actually look at the aligned data and not the quality scores as reported by minknow.

My conclusion with the observation of @rainwala is that this might be a more fundamental library quality issue. @samhorsfield96 can you provide any more information on what you have seen?

Also can you report minKNOW versions here too?

rainwala commented 1 year ago

@mattloose I think it isn't necessarily our library because we get much higher Q scores with readfish / adaptive sampling (we are trying both to try to get anything to work!) turned off. It may be something related to this: https://community.nanoporetech.com/posts/low-q-score-with-new-kit-1

mattloose commented 1 year ago

That is an interesting post. Reading my comment above I realise it isn't quite clear - in my comment about a fundemental library quality issue I really meant it's something other than the change in break_reads length.

We also see some strange changes in reported read quality - I can't comment if recalling the data using the super model will resolve the problem but I can imagine that to be plausible.

What I can say is that I didn't think I had seen the quality drop off in real world data espeically. But I will look more.

mattloose commented 1 year ago

Could you both check the accuracy of on target and off target reads after mapping to the reference and look at the quality that way?

samhorsfield96 commented 1 year ago

Hi @mattloose sorry for not getting back sooner. Alignment looks fine using pass and fail reads - I haven't got formal measures of quality, however when looking at enrichment by abundance, values are as expected even using failed reads. I've noticed that the high volume of failed reads doesn't happen with every run either, and when running with the exact same setup and library I get different outcomes (sometimes many reads fail, sometimes the run looks normal). Not sure if this info helps your end, but it looks to me that using 'failed' reads doesn't actually impact alignment or calculation of enrichment.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

mattloose commented 11 months ago

Closing as stale.