WGLab / DeepRepeat

An accurate repeat detection from Nanopore data using deep learning and image techniques
Other
19 stars 4 forks source link

Cannot generate index file #10

Open neobernad opened 1 year ago

neobernad commented 1 year ago

Hi,

We are trying to use DeepRepeat, however we are getting always the following error:

Error!! Cannot generate index file for /../../../tandem_repeats/test/fast5/f5.f5index.

Our fast5 are single fast5 (there were multiple fast5 but they were transformed into single fast5). As there is no additional logging messages, do you have any clue how to proceed from here? Would the fast5 files work if they are processed by Guppy?

Best, José Antonio

fanavarro commented 1 year ago

Same here, I would like to try by basecalling my fast5 files with albacore, as it seems to be a requirement by DeepRepeat, however I cannot found it anymore in the nanopore download page. Do you know how to get the Albacore software needed by DeepRepeat?

Thanks in advance!

liuqianhn commented 1 year ago

@fanavarro @neobernad The current DeepRepeat does not support basecalled fast5 with Guppy, since the output of Guppy and Albacore is totally different. The albacore version used is v2.3.4. @kaichop Can suggest future development with Guppy.

fanavarro commented 1 year ago

After inspecting a little bit the code of Fast5Index.c I've seen that, for some reason, the sequencing_summary.txt file should be in the same path of the FAST5 files. So, for example, the following command wont work:

./DeepRepeat/bin/scripts/IndexF5files fast5/ fast5_basecalled/ f5 sequencing_summary.txt 0

I had to move the sequencing summary file into the fast5 folder to be able to generate the index as follows:

./DeepRepeat/bin/scripts/IndexF5files fast5/ fast5_basecalled/ f5 fast5/sequencing_summary.txt 0

kaichop commented 1 year ago

You can find it https://community.nanoporetech.com/downloads, but it is best to just ask nanopore for a copy since an account is needed to access their software tools.

On Tue, Feb 7, 2023 at 4:49 AM Francisco Abad @.***> wrote:

Same here, I would like to try by basecalling my fast5 files with albacore, as it seems to be a requirement by DeepRepeat, however I cannot found it anymore in the nanopore download page. Do you know how to get the Albacore software needed by DeepRepeat?

Thanks in advance!

— Reply to this email directly, view it on GitHub https://github.com/WGLab/DeepRepeat/issues/10#issuecomment-1420484749, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OFZQT2VGS2QRZPTBMLWWILA7ANCNFSM6AAAAAAUS3J5CE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

fanavarro commented 1 year ago

Hi @kaichop, thanks for your answer.

Actually, I could not find the Albacore software in your link; I had to ask nanopore and they provided me the software through the following link:

https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-2.3.4-cp36-cp36m-manylinux1_x86_64.whl

kaichop commented 1 year ago

Thank you! Maybe they only display new or new version of software now and put older ones in archival mode.

On Sun, Feb 12, 2023 at 12:07 PM Francisco Abad @.***> wrote:

Hi @kaichop https://github.com/kaichop, thanks for your answer.

Actually, I could not find the Albacore software in your link; I had to ask nanopore and they provided me the software through the following link:

https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-2.3.4-cp36-cp36m-manylinux1_x86_64.whl

— Reply to this email directly, view it on GitHub https://github.com/WGLab/DeepRepeat/issues/10#issuecomment-1427082799, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OCDYM4344N3AZJDPYDWXEKDRANCNFSM6AAAAAAUS3J5CE . You are receiving this because you were mentioned.Message ID: @.***>

neobernad commented 1 year ago

Hi @kaichop,

We've been working out how to generate de fast5 index without success.

Firstly, we are running the dockerized version to execute DeepSignal, and it promps the following:

Generating f5index: /app/scripts/IndexF5files /workspace/us/analysis/test_id/tandem_repeats/original/fast5 fast5_basecalled f5 /workspace/us/analysis/test_id/tandem_repeats/original/fast5/sequencing_summary.txt True
Error!! Cannot generate index file for /workspace/us/analysis/test_id/tandem_repeats/original/fast5/f5.f5index.

However, if we log into the docker, as in:

docker run --rm -it --entrypoint bash -v "/workspace/":/workspace/  genomicslab/deeprepeat:0.1.3

And run the same command internally, it seems it does not complain:

/app/scripts/IndexF5files /workspace/us/analysis/test_id/tandem_repeats/original/fast5 fast5_basecalled f5 /workspace/us/analysis/test_id/tandem_repeats/original/fast5/sequencing_summary.txt True

Which prompts:

root@25b7f16a886c:/app# /app/scripts/IndexF5files /workspace/us/analysis/test_id/tandem_repeats/original/fast5 fast5_basecalled f5 /workspace/us/analysis/test_id/tandem_repeats/original/fast5/sequencing_summary.txt True
Options to be used:
    base_index_path = /workspace/us/analysis/test_id/tandem_repeats/original/fast5
    basecalled_path = fast5_basecalled
    uniq_id = f5
    seq_sum = /workspace/us/analysis/test_id/tandem_repeats/original/fast5/sequencing_summary.txt
    multifast5 = 0

The input sequencing summary are:
    /workspace/us/analysis/test_id/tandem_repeats/original/fast5/sequencing_summary.txt
The basecall path is <fast5_basecalled>
The output index file is </workspace/us/analysis/test_id/tandem_repeats/original/fast5/f5.f5index>
   </workspace/us/analysis/test_id/tandem_repeats/original/fast5/fast5_basecalled/*/*.fast5> has 664712 fast5 files

Base index is saved in </workspace/us/analysis/test_id/tandem_repeats/original/fast5/f5.f5index>.
Please keep this file together with its indexed fast5 folders.
You might need to provide this file for feature extraction and repeat detection.

Notice how now it says '</workspace/us/analysis/test_id/tandem_repeats/original/fast5/fast5_basecalled/*/*.fast5> has 664712 fast5 files'. But, if we access the index file, so the one placed at /workspace/us/analysis/test_id/tandem_repeats/original/fast5/f5.f5index it is empty.

What are we doing wrong?

Best, José Antonio

Slowlysun commented 1 year ago

Hello , did you solve this problem? I have the same problem that the index file is empty and would like to know how you solved it.

liuqianhn commented 1 year ago

@neobernad @Slowlysun When you run docker, you need to map your local directory to docker directory . For example, if you run docker run --rm -it --entrypoint bash -v "/workspace/":/workspace/ genomicslab/deeprepeat:0.1.3, you need to make sure that you have a path /workspace for your local computer. Otherwise, docker cannot find the path of the folder.

Slowlysun commented 1 year ago

I used conda to install DeepRepeat, and only run IndexF5files to get the index file, it seems to have found 10 fast5 files, but the result is empty.

Options to be used: base_index_path = ./ basecalled_path = ./fast5s/ uniq_id = f5 seq_sum = sequencing_summary.txt multifast5 = 0

The input sequencing summary are: sequencing_summary.txt The basecall path is <./fast5s/> The output index file is <f5.f5index> <./fast5s//.fast5> has 10 fast5 files

Base index is saved in <f5.f5index>. Please keep this file together with its indexed fast5 folders. You might need to provide this file for feature extraction and repeat detection.

liuqianhn commented 1 year ago

@Slowlysun Could you please share the full commands, and the folder of fast5 input? Thank you.

Slowlysun commented 1 year ago

There are sequencing_summary.txt file and fast5s path in the current path. $ ls ./ fast5s/ sequencing_summary.txt $ ls ./fast5s/0/ FAT78428_pass_barcode11_f2a03b54_63267858_0.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_1.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_2.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_3.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_4.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_5.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_6.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_7.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_8.fast5 FAT78428_pass_barcode11_f2a03b54_63267858_9.fast5

generate index file commands: /work/software/DeepRepeat/bin/DeepRepeat_scripts/IndexF5files ./ fast5s/ f5 ./sequencing_summary.txt 0