Closed AdrienJarretier closed 5 years ago
Hi @AdrienJarretier,
No worries, I've already a new NanoR version with lots of improvements almost ready to be shared with all of You: at the moment, NanoR is submitted to a scientific journal and last time I submitted the revision I didn't have a test set for multi-read .fast5 files. But few days ago I managed to get that and I developed a 2.0 NanoR version that will be online shortly. Attached there are the new versions, fell free to use them.
This is a possible workflow for plotting statistics.
DataPass<-'/path/to/fast5_pass' DataFail<-'/path/to/fast5_fail' Label<-'MultiRead' DataOut<-'/path/to/dataout' List<-NanoPrepareM(DataPass,DataFail, Label=Label, MultiRead=TRUE) Table<-NanoTableM(List,DataOut,Cores=6,GCC=TRUE) #or FALSE NanoStatsM(List,Table,DataOut)
You can also extract .fastq files with user-defined quality using NanoFastqM(). There are also some other options that were added, but I'll discuss them in the new README that will be published after i received the revision.
NanoTableM.txt NanoStatsM.txt NanoPrepareM.txt NanoFastqM.txt
Amazing, I didn't expect that much that fast.
I will test that right away, thank you very much.
Let me know if it works as expected. It works for my data set but it is always good to have a second opinion !
So no it does not work but it's on the right track, what I did :
NanoR.tar.gz
archive with the 4 new onesinstall.packages('NanoR.tar.gz', repos=NULL)
We can observe that the files seem to be read correctly since the table is not empty, but then there is still an error.
> Label<-'MultiRead'
>
> List<-NanoPrepareM(DATA_PASS_DIR,DataFail=NA, Label=Label, MultiRead=TRUE)
1 multiread .fast5 files specified as passed
No failed .fast5 files path specified
No skipped .fast5 files path specified
Done
>
> Table<-NanoTableM(List,DATA_OUT_DIR,Cores=4,GCC=TRUE) #or FALSE
Extracting metadata and calculating GC content from multi-read .fast5 files...
Done
>
> head(Table)
Read Id Channel Number Mux Number
[1,] "000d64fc-5091-44ef-959e-2dd3aeb059fe" "87" "3"
[2,] "0045408f-5155-4a5c-a383-9d76863eaa57" "253" "1"
[3,] "005d8a76-dbe7-45df-8adf-0627aa8636a3" "472" "3"
[4,] "005fd50c-39c8-4be7-a073-abebe213bfe6" "161" "4"
[5,] "0067c135-8ca6-4e51-81a7-ce7249bae342" "355" "2"
[6,] "007d49b2-7201-4394-afc7-84d1c486a323" "295" "4"
Unix Time Length of Read Quality GC Content
[1,] "1550590230" "712" "Qscore" "0.401685393258427"
[2,] "1550590340" "939" "Qscore" "0.39297124600639"
[3,] "1550590223" "497" "Qscore" "0.378269617706237"
[4,] "1550590260" "4719" "Qscore" "0.339902521720704"
[5,] "1550590306" "4359" "Qscore" "0.379215416379904"
[6,] "1550590244" "1913" "Qscore" "0.569785676947203"
>
> NanoStatsM(List,Table,DATA_OUT_DIR)
Error in seq.default(from = min(round(Relative_Time)), to = max(round(Relative_Time)), :
'from' must be a finite number
Calls: NanoStatsM -> seq -> seq.default
In addition: Warning messages:
1: In which(as.numeric(NanoTable[, 6]) >= 7) : NAs introduced by coercion
2: In which(as.numeric(NanoTable[, 6]) < 7) : NAs introduced by coercion
3: In max(Time_2) : no non-missing arguments to max; returning -Inf
4: In min(Time_2) : no non-missing arguments to min; returning Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf
7: In min(round(Relative_Time)) :
no non-missing arguments to min; returning Inf
8: In max(round(Relative_Time)) :
no non-missing arguments to max; returning -Inf
Execution halted
Well, this is not an issue. The problem is that You run your analysis using one single multi-line .fast5 file and not a complete Run. Indeed, NanoR is built to work on experiments that run for hours, not minutes (default experimental run duration is 48 hours). Try using a complete set of multi-read .fast5 files, this will work. Otherwise, it can’t rescale time correctly !
Best
Yes, I though of that, i ran it on the whole experiment too, it took 4 hours before it crashed with this error, in 2 days I should have a brand new experiment to try it on again though.
I just re-tested the same function I sent You in the multi-read .fast5 files I have and it works perfectly. It seems very strange to me that NanoStatsM took 4 hours to calculate statistics.. moreover, it seems that the quality score was not correctly extracted from your .fast5 files.
I’ll look into the sample you gave me tomorrow and let you know.
Hi @AdrienJarretier,
I've looked into your testset. First of all You are using multiple cores when analyzing one multi-read .fast5 files. Probably this is not a problem, but it's better if you have at least 4 multi-read .fast5 files to use 4 Cores. Moreover, I changed a little bit the code for NanoTableM and NanoStatsM, which are attached below. Now, the quality should be extracted without problems (even if i could not recreate the issue, I guess it was something related to a variable with the same name of another). I can confirm, however, that the main problem is your run duration, which is only 4 minutes (too short). Let's keep in touch for the new experiment.
You can run them as suggested before
Closing the issue, for now
Best,
Davide
Hi, me again,
So I am trying to analyse new data from a recent sequencing with the MinION,
Trouble is, they changed the format of the output files a bit, now instead of having one file per read, MinKnow gives multiple reads in one file like so :
It's a good thing, since before it gave us way too many small files and it was difficult for the filesystem to handle.
So NanoTableM cannot read this, I think the culprit is actually the rhdf5 there :
https://github.com/davidebolo1993/NanoR/blob/master/Scripts/NanoTableM.R#L49
I'm looking into it, maybe there is a simple way to handle this with rhdf5, but if you have any input i'd be happy to hear it.
Here is a link to download the file from the screenshot above : https://we.tl/t-bwTQfZaGOf