WGLab / DeepRepeat

An accurate repeat detection from Nanopore data using deep learning and image techniques
Other
19 stars 4 forks source link

Error on running on example data and data available to us #4

Closed sabiqali closed 1 year ago

sabiqali commented 2 years ago

Hi,

I was trying to install and run Deep Repeat on our data and it errored out. I thought I might have been doing something wrong, so I tried it on the example data as outlined in this README. But that errors out with the same statement.

I have attached the output of running it on the example data. It seems to be erroring out with a Segmentation Fault for one of the dependent libraries which was built as part of the installation steps.

Would you be able to help us out?

``The following options are used (included default): UniqueID (TGC_chr16_73546662-73546736); bam (na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam); basecalled_path (workspace/pass/); f5config (DeepRepeat/bin/data/config/fast5_path.config); f5folder (na12878_loci/TGC_chr16_73546662-73546736); f5i (na12878_loci/TGC_chr16_73546662-73546736/na.f5index); f5i_basefile (na); feature_num (50); label_size (4); merge_gap (4.5); mod_path (None); mod_version (2); multif5 (0); nb_size (3); nbsize (-1.5); outlog (0); outputfolder (test_op/TGC_chr16_73546662-73546736); pcr (True); repeat (chr16:73546662-73546736:TGC:3); repeat_name (TGC_chr16_73546662-73546736); repeat_pat (TGC); rpg (DeepRepeat/bin/data/trf.v0.bed); summary_file (na12878_loci/TGC_chr16_73546662-73546736/sequencing_summary.txt);

Generating features: DeepRepeat/bin/scripts/genomic1FE na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam test_op/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs na12878_loci/TGC_chr16_73546662-73546736/ na12878_loci/TGC_chr16_73546662-73546736/na.f5index chr16:73546662-73546736:TGC:3 DeepRepeat/bin/data/trf.v0.bed DeepRepeat/bin/data/config/fast5_path.config -1.5 500 p_f5_conf_file DeepRepeat/bin/data/config/fast5_path.config Input = na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam test_op/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs free(): invalid size Segmentation fault Error! Cannot generate fs file: test_op/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs``

liuqianhn commented 2 years ago

@sabiqali Thanks for being interested in DeepRepeat. Could you please share the relative path to fast5 files, and show what is output for h5ls -r YOUR-FAST5-FILE | head -n 50

sabiqali commented 2 years ago

it shows this:

/ Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_000/BaseCalled_template/Events Dataset {507468} /Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/Configuration Group /Analyses/Basecall_1D_000/Configuration/basecall_1d Group /Analyses/Basecall_1D_000/Summary Group /Analyses/Basecall_1D_000/Summary/basecall_1d_template Group /Analyses/Calibration_Strand_Detection_000 Group /Analyses/Calibration_Strand_Detection_000/Configuration Group /Analyses/Calibration_Strand_Detection_000/Configuration/calib_detector Group /Analyses/Calibration_Strand_Detection_000/Summary Group /Analyses/Calibration_Strand_Detection_000/Summary/calibration_strand_template Group /Analyses/Segmentation_000 Group /Analyses/Segmentation_000/Configuration Group /Analyses/Segmentation_000/Configuration/stall_removal Group /Analyses/Segmentation_000/Summary Group /Analyses/Segmentation_000/Summary/segmentation Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_14276 Group /Raw/Reads/Read_14276/Signal Dataset {2537490/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group

jts commented 2 years ago

Just to chime in here, we're having this problem with the example data that we downloaded from the tutorial link in this repo.

liuqianhn commented 2 years ago

@jts @sabiqali : it is a C++ issue. Before the issue is fixed, could you please try to use docker. The command is docker run -v /data/data1/test/deeprepeat:/tmp --rm genomicslab/deeprepeat:0.1.4 python DeepRepeat.py Detect --gn hx1 --TempRem 0 --epchon 200 --repeat_relax_bp 20 --UniqueID TGC_chr16_73546662-73546736 --is_pcr 0 --repeatName TGC_chr16_73546662-73546736 --repeat chr16:73546662-73546736:TGC:3 --f5i /tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index --o /tmp/test_op/TGC_chr16_73546662-73546736 --bam /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam --f5folder /tmp/na12878_loci/TGC_chr16_73546662-73546736 --algLen 500 for example

kaichop commented 2 years ago

did anybody run docker and do not encounter this problem? It may be a lack of memory issue as there is a statement "free(): invalid size" in the log message. Or, as Chris mentioned, maybe a GCC version issue where the code is not compiled correctly with a different version of GCC.

sabiqali commented 2 years ago

@kaichop, we are in the process of running the docker on our compute cluster. docker takes a bit more work to use on our cluster. I will revert when I have run it according to the suggestion by @liuqianhn.

May I ask which version of GCC you are expecting while compiling the dependencies?

liuqianhn commented 2 years ago

This is an issue of GCC versions which I did not realize before. It is not memory issue because I used a node with very large free memory. In my zoom-in testing, the error shows that one system c++ file does not existing, which causes "free ()..." error.

@sabiqali I previously tested on an older gcc (v4. or v5.: I do not remember the detail).

liuqianhn commented 2 years ago

@sabiqali @jts You can also try singularity which might be easy. The help document can be found here: https://carpentries-incubator.github.io/singularity-introduction/05-singularity-docker/index.html

sabiqali commented 2 years ago

@liuqianhn @kaichop we tried implementing the docker image, it took a bit of time and some tries to get it working but it seems to error out as well. We tried the command suggested above and it gives pretty much the same error:

free(): invalid size Aborted (core dumped)

This was the other error log: The following options are used (included default): UniqueID (TGC_chr16_73546662-73546736); bam (/tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam); basecalled_path (workspace/pass/); f5config (/app/data/config/fast5_path.config); f5folder (/tmp/na12878_loci/TGC_chr16_73546662-73546736); f5i (/tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index); f5i_basefile (na); feature_num (50); label_size (4); merge_gap (4.5); mod_path (None); mod_version (2); multif5 (0); nb_size (3); nbsize (-1.5); outlog (0); outputfolder (/tmp/TGC_chr16_73546662-73546736); pcr (True); repeat (chr16:73546662-73546736:TGC:3); repeat_name (TGC_chr16_73546662-73546736); repeat_pat (TGC); rpg (/app/data/trf.v0.bed); summary_file (/tmp/na12878_loci/TGC_chr16_73546662-73546736/sequencing_summary.txt); Generating features: /app/scripts/genomic1FE /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs /tmp/na12878_loci/TGC_chr16_73546662-73546736/ /tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index chr16:73546662-73546736:TGC:3 /app/data/trf.v0.bed /app/data/config/fast5_path.config -1.5 500 p_f5_conf_file /app/data/config/fast5_path.config Input = /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs Error! Cannot generate fs file: /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs

As you can see, it is pretty much the same as the previous one.

FAFUshiyan commented 2 years ago

@liuqianhn @kaichop we tried implementing the docker image, it took a bit of time and some tries to get it working but it seems to error out as well. We tried the command suggested above and it gives pretty much the same error:

free(): invalid size Aborted (core dumped)

This was the other error log: The following options are used (included default): UniqueID (TGC_chr16_73546662-73546736); bam (/tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam); basecalled_path (workspace/pass/); f5config (/app/data/config/fast5_path.config); f5folder (/tmp/na12878_loci/TGC_chr16_73546662-73546736); f5i (/tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index); f5i_basefile (na); feature_num (50); label_size (4); merge_gap (4.5); mod_path (None); mod_version (2); multif5 (0); nb_size (3); nbsize (-1.5); outlog (0); outputfolder (/tmp/TGC_chr16_73546662-73546736); pcr (True); repeat (chr16:73546662-73546736:TGC:3); repeat_name (TGC_chr16_73546662-73546736); repeat_pat (TGC); rpg (/app/data/trf.v0.bed); summary_file (/tmp/na12878_loci/TGC_chr16_73546662-73546736/sequencing_summary.txt); Generating features: /app/scripts/genomic1FE /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs /tmp/na12878_loci/TGC_chr16_73546662-73546736/ /tmp/na12878_loci/TGC_chr16_73546662-73546736/na.f5index chr16:73546662-73546736:TGC:3 /app/data/trf.v0.bed /app/data/config/fast5_path.config -1.5 500 p_f5_conf_file /app/data/config/fast5_path.config Input = /tmp/na12878_loci/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.bam /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs Error! Cannot generate fs file: /tmp/TGC_chr16_73546662-73546736/TGC_chr16_73546662-73546736.fs

As you can see, it is pretty much the same as the previous one.

Is this problem solved? I'm having the same problem.

liuqianhn commented 2 years ago

Please try to install glibc-source (please note that: does NOT install glibc especially with root permission, since the installation of glibc will crash your OS).

sabiqali commented 1 year ago

Hi @liuqianhn,

I can confirm that the solution suggested in Issue #6 and the subsequent changes to the environment.yml fixed the issue I was having. I can now run the software on the test data provided. I will test on the data that we have in our lab.

Thank you for your help!