arnederoeck / NanoSatellite

Dynamic time warping of Oxford Nanopore squiggle data to characterize tandem repeats.
MIT License
32 stars 1 forks source link

error in dtw #7

Closed liutiming closed 5 years ago

liutiming commented 5 years ago

image

Hello I am having this error when running the _Rscript Signal2chunk.R _

However, I do not have the same issue when running it on datasets that contain fewer reads of the repeats.

Can I ask if you could help to look into this problem, please?

Thanks a lot!

wdecoster commented 5 years ago

Would it be possible to share the data to reproduce your issue?

liutiming commented 5 years ago

Would it be possible to share the data to reproduce your issue?

Yes. How may I share the data with you?

wdecoster commented 5 years ago

I don't know about the size of your dataset? You can find my email address in my github bio, or alternatively send me a dropbox link or use another file sharing option.

liutiming commented 5 years ago

Shared thanks!

wdecoster commented 5 years ago

Thanks, I downloaded the files. But I won't be able to reproduce your issue without the fast5 files I'm afraid. The amp_combined_long_chr6-16327635-16327722_spanning_reads.tsv refers to fast5 files on your /mnt/, e.g. /mnt/c/np/amp/data/20181017_0944_NP-AMP_MIX/fast5/pass/3/CN36000888_0_20181017_FAK22869_MN27391_sequencing_run_NP_AMP_MIX_72796_read_663_ch_72_strand.fast5

With the fast5 files I could repeat Spanning_read_extractor.sh.

I'll do my best to troubleshoot, as the developer of NanoSatellite (@arnederoeck) doesn't have the time right now.

liutiming commented 5 years ago

Oops noted thanks. I am uploading the fast5 file now and it will take a while. Will update you once the uploading has been done. Please note that the fast5 reads used are only a subset of the fastq reads because I only wanted to test that NanoSatellite works and running fewer reads will reduce the computational load.

Just FYI, the SCA1_log file contains the stdout and may be helpful for troubleshooting.

Thank you very much for your help!

arnederoeck commented 5 years ago

Hi, my apologies for the late responses, I'm currently swamped with other projects. I took a look at the files you shared with wouter, and it seems that one sequencing read is causing the issue: e15ce084-dffd-4aea-a4f6-c129e562f801 (or possibly the next read e15ce19b-3f07-4634-a0a1-be4d00093eee). For now, I would suggest removing this read out of your spanning reads file and then the algorithm should run fine. In the future, I hope I can adapt the script so that this no longer happens.

Thanks for your patience!

liutiming commented 5 years ago

Thanks for the reply!

I removed the two reads and got this error when running: image Is it because that my computer's capacity is not enough?

I am currently running it on a computer with 4 core CPU and 32GB Memory.