Open RichardCorbett opened 3 years ago
This looks strange - I'm not entirely sure of the issue. Pinging @jts to see if he has any insights?
Is the command you are using to run nanopolish outside of the pipeline equivalent to the call within the pipeline?
One thing to change is to use fast5/barcode
as the input to --fast5-directory
, but this is unlikely to be the problem here.
My best guess is that there is something about the sequencing_summary.txt
file that nanopolish doesn't like. I suggest running the nanopolish index command outside of the artic
wrappers to see what it prints to stderr, which might help diagnose what has gone wrong.
Thanks folks, Overnight last night I was able to run a different set of data smoothly and quickly. There must be something fishy with this one flowcell of data. I tried running just one fastq file with this command:
nanopolish index -d ../fast5/ -s ../sequencing_summary.txt -v ../plex_FAP90847_ACCACTGCCATGTATCAAAGTACG_pass_concat.fastq
[readdb] indexing ../fast5/
but I don't get any more output after that. I suspect there is some mismatch between the fastq/fast5/sequencing summary files I am using. I think I can dig in from here. Sorry for the trouble.
Hi folks,
It looks like our system is doing something possibly unique.
We are loading a minION flowcell, but extracting the fastq files and sequencing summary at a timepoint in the middle of a run. At this time guppy has finished basecalling available fast5 files, but will be restarted once new fast5 files are available.
The run continues and fast5 files are generated until someone decides to shut down the sequencing run.
When I get around to starting my analysis I have the following inputs: -fastq file from the intermediate timepoint -sequencing summary file from the intermediate timepoint -fast5 files from a later timepoint
This configuration gives me the hanging nanopore index
commands.
If I change the sequencing_summary.txt file to be the one created after the run completes the minion
command completes in a few minutes. Is it possible that nanopore index
hangs when the seuqencing_summary.txt file contains a subset of the reads contained in the fast5 files?
In this case nanopolish is going to revert to the slow indexing method for the subset of fast5s that haven't been basecalled. If the run progressed well this can mean opening and reading 1000s of files, which takes awhile.
To work around this, I suggest making a directory containing symlinks to the fast5s that are basecalled and present in the sequencing summary, then passing the directory-of-symlinks to nanopolish index
. This is what we do to analyze an in-progress run.
Wonderful. Thanks @jts
Tried this out and indeed symlinking the fast5s referenced in the sequencing_summary.txt file allows the analysis to take just a minute or two per sample in the pool.
Hi all,
We are sequencing minION flowcells using the ARTIC amplicons and beginning to test out the ARTIC workflow as described on this page:
ARTIC SOP 2019
Our data are sequenced as a pool of 96 samples (including some negative and positive controls), basecalled with Guppy, and index split using the approach outlined in the SOP above.
Likely as expected, the guppyplex command is efficient and runs in just a minute or so on each barcode. This gives us 96 filtered fastq files with read counts ranging from 0 to 60,000.
When I run the
minion
command like this (one job per barcode):artic minion --normalise 200 --threads 12 --scheme-directory ~/bin/artic-ncov2019/primer_schemes/ --read-file plex_FAP90847_AAGGTTACACAAACCCTGGACAAG_pass_concat.fastq --fast5-directory fast5 --sequencing-summary sequencing_summary.txt nCoV-2019/V3 plex_FAP90847_AAGGTTACACAAACCCTGGACAAG_pass_concat
It fires up the jobs without error, but after the first few minutes the output doesn't seem to be updated though the jobs don't crash or finish. I can see that the
nanopolish index
command is running and registering on the CPUs of my server, and there doesn't appear to be any iowat on the nodes, suggesting there isn't a network bottleneck slowing things down.Within the first couple of minutes of the
nanopolish index
commands firing up I get files like these:The jobs have run for another 24 hours without any more output going to the files. However, I see that
nanopolish index
has been running on my nodes and using CPU throughout.Do you have any ideas how I should look in to what is slowing this down? I have samples that have been analyzing for over 24 hours - Is this expected? None of the samples that have gotten on the cluster have finished.
I have used nanopolish index on entire promethion flowcells outside of the ARTIC workflow and it runs in a few hours so I expected to be able to index the minIon data much faster. Perhaps running multiple samples in parallel is known to create issues with
nanopolish index
?thanks for all your insight!