davidebolo1993 / NanoR

Nanopore data analysis in R
GNU Lesser General Public License v3.0
39 stars 10 forks source link

problem running NanoR #9

Closed lfaino closed 5 years ago

lfaino commented 5 years ago

Dear @davidebolo1993 i get this error when i run NanoR

NanoStatsM(NanoMList=List,NanoMTable=Table,DataOut="/data/lfainoData/trainingDataset/DataOut/", KeepGGObj = FALSE)

Analyzing... Plotting... Error in rbind(Label, Longest, Shortest, MeanDim, MedianDim, HQ, LQ, MeanQ, : object 'Label' not found

can you please help me? Cheers Luigi

davidebolo1993 commented 5 years ago

Hi @lfaino,

thanks for getting in touch and sorry for the inconvenience. If you re-install NanoR and re-run the same command, now should be fine. There was a typo in the variable name.

Let me know if this fixes your issue.

lfaino commented 5 years ago

Now i get this error: NanoStatsM(NanoMList=List,NanoMTable=Table,DataOut="/home/lfaino/lfainoData/Xylella/regression/DataOut/", KeepGGObj = FALSE) #plot statistics. To store table behind ggplot2-plots, switch KeepGGObj to TRUE Error in seq.default(from = min(round(Relative_Time)), to = max(round(Relative_Time)), : 'from' must be a finite number In addition: Warning messages: 1: In max(Time_2) : no non-missing arguments to max; returning -Inf 2: In min(Time_2) : no non-missing arguments to min; returning Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 5: In min(round(Relative_Time)) : no non-missing arguments to min; returning Inf 6: In max(round(Relative_Time)) : no non-missing arguments to max; returning -Inf

any idea?

davidebolo1993 commented 5 years ago

Hi @lfaino,

this seems to occur on a different dataset than the one you tested before, am I wrong? However, to me it seems that something has gone wrong with the previous command. Can you please copy few lines from Table?

lfaino commented 5 years ago

I found the problem.

thanks

lfaino commented 5 years ago

The issue is that the NanoPrepareM is recursive while NanoTableM is not. is it possible to have NanoTableM recursive as well?

Cheers Luigi

davidebolo1993 commented 5 years ago

Hi, I can't get the point, sorry.

NanoTableM extracts metadata from .fast5 files which path has been previously checked by NanoPrepareM. That is, if you have passed .fast5 files in subdirectories, just give NanoPrepareM the parental folder. NanoTableM uses the paths found by NanoPrepareM to extract informations from each .fast5 in subdirs. If NanoPrepareM is recursive, NanoTableM is recursive as well.

As a general suggestion, it's better to provide a minimal example in order to solve the issue more quickly.

lfaino commented 5 years ago

let`s see if i can explain better: 1) i run this command List<-NanoPrepareM(DataPass="/home/lfaino/regression",DataFail=NA,DataSkip=NA, Label="Exp", MultiRead=FALSE)

the fast5 files are not directly in the /home/lfaino/regression folder but in the /home/lfaino/regression/organism/fast5 the NanoPrepareM tool find all the fast5 files that are in the folder /home/lfaino/regression/organism/fast5 although i set /home/lfaino/regression/ as a path where to search.

this tells me that the tool NanoPrepareM works recursively.

2) i run Table<-NanoTableM(NanoMList=List,DataOut="/home/lfaino/lfainoData/Xylella/regression/DataOut/",Cores=72,GCC=FALSE)

the table is made but it looks like this (head of the table):

Read Id Channel Number Mux Number Unix Time Length of Read Quality GC Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content Read_Id Channel Mux Unix_Time Length Qscore GC_Content

davidebolo1993 commented 5 years ago

Ok, so that is not an issue from NanoTableM. This means that your .fast5 file are either not readable (lack of permissions, but it does not seem the case) or are not basecalled, so that NanoTableM cannot extract metadata. In addition, have you checked if your .fast5 files are not multi-read ? In that case, simply switch MultiRead to TRUE.

Moreover, if this does not answer your issue, you can send me a small set of your .fast5 files. I'll have a look into these as soon as I can.

lfaino commented 5 years ago

If I remember correctly, these reads are not basecalled

davidebolo1993 commented 5 years ago

So that's the problem. Of course, NanoR cannot extract data if they are not there !

Do you have a sequencing summary file in your output from Nanopore? Usually, together with .fast5 files, you get .fastq files and a sequencing summary file. You can use those to run the G version of NanoR.

davidebolo1993 commented 5 years ago

Hi @lfaino,

any updates on this ?

lfaino commented 5 years ago

Still not working even if i use the GridION protocol

davidebolo1993 commented 5 years ago

Then we need to be more specific (I’m using the G version of NanoR quite often and I do not see any issues). I need to know how the sequencing summary file looks like (head of the file) and if you have corresponding fastqs. Also, please copy-paste the command line you used and the error you got.

Sent with GitHawk

lfaino commented 5 years ago

this is the error. looks that the list is full and the table as well no idea the problem now

Analyzing... Plotting... Error in ggsave("Yield.pdf", device = "pdf", Cumulative_Plot, height = 10, : plot should be a ggplot2 plot or tracks object

davidebolo1993 commented 5 years ago

Can you please copy paste all the commands you run and send me the table you are using ? This seems again an issue related to the data you are giving as input.

Sent with GitHawk

lfaino commented 5 years ago

these are the commands List<-NanoPrepareG(DataSummary="/data/lfainoData/FOM/newDemulti11_07_2019/first/nanoR/", DataFastq="/data/lfainoData/FOM/newDemulti11_07_2019/first/nanoR/", Cores = 30, Label="Exp")

Table<-NanoTableG(NanoGList=List,DataOut="/data/lfainoData/FOM/newDemulti11_07_2019/first/nanoR/DataOut",GCC=FALSE)

NanoStatsG(NanoGList=List,NanoGTable=Table,DataOut="/data/lfainoData/FOM/newDemulti11_07_2019/first/nanoR/DataOut", KeepGGObj = TRUE) # plot statistics. To store table behind ggplot2-plots, switch KeepGGObj to TRUE

this is the head on the table produced in the first step:

Read Id Channel Number Mux Number Relative Time Length of Read Quality GC content 2a3086e3-f053-43b6-afad-a4c1f729dbe8 123 NA 43635.509766 16600 8.3 GC_Content 04e1458e-44d4-40e4-a088-ece47fa1b1db 271 NA 83214.570312 16646 8.323 GC_Content 900e69fc-6def-484f-8b7d-5ab75a3d2dda 313 NA 36078.091797 26946 13.47325 GC_Content 34a88669-8500-4ee8-8ff0-460be3ee9bae 339 NA 33305.697266 13009 6.5045 GC_Content 71be701a-eb98-4893-b733-69c4c639b5ec 97 NA 32308.506836 20928 10.46425 GC_Content 2fa309e0-fdda-4957-ae04-c03aa8c92b7d 36 NA 38661.201172 21261 10.63075 GC_Content b5827113-460d-4886-b534-5a9089ad129e 153 NA 42016.429688 7279 3.6395 GC_Content 8badc4e0-d807-49ab-9932-aa35b91fc026 61 NA 98410.577148 83957 41.9785 GC_Content 90188590-b3de-40b9-b9b9-53e045ff31f6 140 NA 24879.667969 7281 3.6405 GC_Content

davidebolo1993 commented 5 years ago

Yes, can you please send me the table somehow ? I need to look into this. Also the original one. My email is davidebolognini7@gmail.com.

Sent with GitHawk

davidebolo1993 commented 5 years ago

I also have another question. Do you have fastqs files and the sequencing summary table all in the same folder ? New versions of MinION and GridION x5 store the sequencing summary and the pass fastqs in different folders.

Sent with GitHawk

lfaino commented 5 years ago

Nope, it is a different story. I did the sequencing, after i did the demultiplex and after i did basecalling with guppy. therefore, I have 13 folders in which I have fastqs for each barcode and a summary table for all the reads of that barcode. the path /data/lfainoData/FOM/newDemulti11_07_2019/first/nanoR/ is the path where all the folders after demultiplex and basecalling are stored.

davidebolo1993 commented 5 years ago

I see. I’ve never tested NanoR on this kind of data and I guess it does not support them for the time being. If you want to go through this, I need to have access to your data.

Let me know.

Best,

Davide

Sent with GitHawk

davidebolo1993 commented 5 years ago

Provided a modified function to overcome the problem. Closing for now.