Closed sagnikbanerjee15 closed 9 months ago
Hi,
Is the BAM file sorted and indexed? It seems like the BAM file is not indexed.
You can use samtools index ../new_batch0212/rawData/pass/barcode77/merged.bam
to index the file.
Also, is your data generated using R10.4* flowcells? If so, I would not suggest using tombo to process the files because tombo cannot resquiggle R10.4 reads.
Hi,
I tried the same command after sorting and indexing the bam file. It gives me the same error.
Thank you.
I think it might be because the BAM file is unaligned. I was able to reproduce this error when I used an unaligned BAM file. If you did not specify a reference genome to Guppy then it produced an unaligned BAM file. You would need to align the reads to the reference genome first. If you have FASTQ files from guppy, you can use those, or convert the unaligned BAM file to FASTQ using samtools fastq
.
Thanks for your reply. Aligning the reads worked. But it generated a different error:
Traceback (most recent call last):
File "/DeepMod2/deepmod2", line 150, in <module>
run(params)
File "/DeepMod2/deepmod2", line 10, in run
from src import modDetect
File "/DeepMod2/src/modDetect.py", line 14, in <module>
from . import guppy
File "/DeepMod2/src/guppy.py", line 15, in <module>
from tensorflow import keras
ModuleNotFoundError: No module named 'tensorflow'
Please install tensorflow using pip install tensorflow
. You can find a list of all the packages required here: https://github.com/WGLab/DeepMod2/blob/main/environment.yml
You can install all packages required at once in a new conda environment by following these directions: https://github.com/WGLab/DeepMod2/blob/main/docs/Install.md. You can also install packages one by one in your current environment.
I followed the installation steps outlined in https://github.com/WGLab/DeepMod2/blob/main/docs/Install.md. But I went ahead an did a pip install
for the tensorflow. Now I get a different error
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/DeepMod2/src/guppy.py", line 176, in detect
base_level_data, seq_len, mean_qscore, sequence_length = get_read_signal(read, params['guppy_group'])
File "/DeepMod2/src/guppy.py", line 121, in get_read_signal
segment=read.get_analysis_attributes(guppy_group)['segmentation']
TypeError: 'NoneType' object is not subscriptable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/DeepMod2/deepmod2", line 150, in <module>
run(params)
File "/DeepMod2/deepmod2", line 11, in run
read_pred_file=modDetect.per_read_predict(params)
File "/DeepMod2/src/modDetect.py", line 59, in per_read_predict
file_list=[file_name for file_name in res]
File "/DeepMod2/src/modDetect.py", line 59, in <listcomp>
file_list=[file_name for file_name in res]
File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 870, in next
raise value
TypeError: 'NoneType' object is not subscriptable
The command I am running is:
python /DeepMod2/deepmod2 detect-guppy --file_name barcode77_5mc --threads 8 --fast5 ../fast5_pass/barcode77 --ref ../zymo_methylated_amplicon_sequence.fasta --bam aligned.bam --model /DeepMod2/src//models/guppy/guppy_r10.4/model.58-0.9800.h5 --guppy_group barcode77
It seems like the FAST5 files do not contain move tables. What version of Guppy did you use and did you use --fast5_out option for running Guppy? If yes, then DeepMod2 would need the fast5 output from Guppy, not the fast5 files given as input to Guppy.
On Fri, Mar 10, 2023, 5:58 PM Sagnik Banerjee @.***> wrote:
I followed the installation steps outlined in https://github.com/WGLab/DeepMod2/blob/main/docs/Install.md. But I went ahead an did a pip install for the tensorflow. Now I get a different error
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/DeepMod2/src/guppy.py", line 176, in detect base_level_data, seq_len, mean_qscore, sequence_length = get_read_signal(read, params['guppy_group']) File "/DeepMod2/src/guppy.py", line 121, in get_read_signal segment=read.get_analysis_attributes(guppy_group)['segmentation'] TypeError: 'NoneType' object is not subscriptable """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/DeepMod2/deepmod2", line 150, in
run(params) File "/DeepMod2/deepmod2", line 11, in run read_pred_file=modDetect.per_read_predict(params) File "/DeepMod2/src/modDetect.py", line 59, in per_read_predict file_list=[file_name for file_name in res] File "/DeepMod2/src/modDetect.py", line 59, in file_list=[file_name for file_name in res] File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 870, in next raise value TypeError: 'NoneType' object is not subscriptable The command I am running is:
python /DeepMod2/deepmod2 detect-guppy --file_name barcode77_5mc --threads 8 --fast5 ../fast5_pass/barcode77 --ref ../zymo_methylated_amplicon_sequence.fasta --bam aligned.bam --model /DeepMod2/src//models/guppy/guppy_r10.4/model.58-0.9800.h5 --guppy_group barcode77
— Reply to this email directly, view it on GitHub https://github.com/WGLab/DeepMod2/issues/9#issuecomment-1464605260, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIRI4S7JUMQR67P3JOAEEK3W3OWZJANCNFSM6AAAAAAVW4L2NQ . You are receiving this because you commented.Message ID: @.***>
Hi @umahsn,
Thank you for getting back to me. Guppy was executed as a part of the MinKNOW software on the GridIon GPU framework. The data was generated in the lab and I was given access to the final results. Since guppy was run via MinKNOW, I am not entirely sure of which commands were executed since I don't know if the MinKnow software outputs the commands. The version of guppy we used was 6.1.5
Thank you.
I figured out the errors and was able to fix them. I am getting results now.
Thanks.
The latest version of guppy has decided to remove the fast5_out option. Instead, they have included the move tables in the bam file. Can the unmapped bam files be provided as input to DeepMod2?
Thanks.
I am trying to execute deepmod2 again and I keep getting the following error.
2023-03-27 23:47:32.120599: Starting DeepMod2.
2023-03-27 23:47:32.120900:
Command: python /DeepMod2/deepmod2 detect-guppy --fast5 /var/lib/cwl/stgcfe517fc-3bc3-49ce-b037-c89baa44873c/workspace --ref zymo_methylated_amplicon_sequence.fasta --bam combined_aligned.sortedByPos.bam --threads 72 --output workspace_results --model /DeepMod2/src/models/guppy/guppy_r9.4.1/model.40-0.9370.h5
2023-03-27 23:47:32.678010: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-27 23:47:32.731728: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-27 23:47:32.732341: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-27 23:47:33.771137: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-03-27 23:47:35.048664: Number of files: 1
2023-03-27 23:47:35.334276: Processing BAM File.
2023-03-27 23:47:35.409979: Finished Processing BAM File.
2023-03-27 23:47:35.410024: Starting Per Read Methylation Detection.
2023-03-27 23:47:35.746922: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
[[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-03-27 23:47:35.748509: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
[[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-03-27 23:47:35.749936: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_1_grad/concat/split_1/split_dim' with dtype int32
[[{{node gradients/split_1_grad/concat/split_1/split_dim}}]]
2023-03-27 23:47:35.936717: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/ReverseV2_grad/ReverseV2/ReverseV2/axis' with dtype int32 and shape [1]
[[{{node gradients/ReverseV2_grad/ReverseV2/ReverseV2/axis}}]]
2023-03-27 23:47:35.996007: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
[[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-03-27 23:47:35.997497: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
[[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-03-27 23:47:35.998919: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_1_grad/concat/split_1/split_dim' with dtype int32
[[{{node gradients/split_1_grad/concat/split_1/split_dim}}]]
2023-03-27 23:47:36.267879: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
[[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-03-27 23:47:36.270257: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
[[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-03-27 23:47:36.271667: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_1_grad/concat/split_1/split_dim' with dtype int32
[[{{node gradients/split_1_grad/concat/split_1/split_dim}}]]
2023-03-27 23:47:36.448399: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/ReverseV2_grad/ReverseV2/ReverseV2/axis' with dtype int32 and shape [1]
[[{{node gradients/ReverseV2_grad/ReverseV2/ReverseV2/axis}}]]
2023-03-27 23:47:36.506943: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
[[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-03-27 23:47:36.508415: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
[[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-03-27 23:47:36.509852: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_1_grad/concat/split_1/split_dim' with dtype int32
[[{{node gradients/split_1_grad/concat/split_1/split_dim}}]]
2023-03-27 23:47:36.778613: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
[[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-03-27 23:47:36.780122: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
[[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-03-27 23:47:36.781558: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_1_grad/concat/split_1/split_dim' with dtype int32
[[{{node gradients/split_1_grad/concat/split_1/split_dim}}]]
2023-03-27 23:47:36.959524: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/ReverseV2_grad/ReverseV2/ReverseV2/axis' with dtype int32 and shape [1]
[[{{node gradients/ReverseV2_grad/ReverseV2/ReverseV2/axis}}]]
2023-03-27 23:47:37.018016: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
[[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-03-27 23:47:37.019487: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
[[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-03-27 23:47:37.020919: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_1_grad/concat/split_1/split_dim' with dtype int32
[[{{node gradients/split_1_grad/concat/split_1/split_dim}}]]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/DeepMod2/src/guppy.py", line 194, in detect
base_seq=[base_map[fq[x]] for x in range(read_pos-window, read_pos+window+1)]
File "/DeepMod2/src/guppy.py", line 194, in <listcomp>
base_seq=[base_map[fq[x]] for x in range(read_pos-window, read_pos+window+1)]
IndexError: string index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/DeepMod2/deepmod2", line 150, in <module>
run(params)
File "/DeepMod2/deepmod2", line 11, in run
read_pred_file=modDetect.per_read_predict(params)
File "/DeepMod2/src/modDetect.py", line 59, in per_read_predict
file_list=[file_name for file_name in res]
File "/DeepMod2/src/modDetect.py", line 59, in <listcomp>
file_list=[file_name for file_name in res]
File "/opt/conda/lib/python3.10/multiprocessing/pool.py", line 873, in next
raise value
IndexError: string index out of range
The command to execute guppy is
GUPPY_VERSION=6.3.8
nohup guppy_basecaller \
--save_path /local/2023_03_26_Brain_Heart_Zymo_barcode77_small_${GUPPY_VERSION} \
--config dna_r9.4.1_e8.1_modbases_5mc_cg_sup.cfg \
--progress_stats_frequency 10 \
--input_path /local/Brain_Heart_Zymo/fast5/barcode77_small \
--compress_fastq \
--recursive \
--barcode_kits SQK-NBD112-96 \
--fast5_out \
--bam_out \
--verbose_logs \
--cpu_threads_per_caller 72 \
1> /local/2023_03_26_Brain_Heart_Zymo_barcode77_small_${GUPPY_VERSION}.output \
2> /local/2023_03_26_Brain_Heart_Zymo_barcode77_small_${GUPPY_VERSION}.error &
Please let me know what I should change.
Thank you.
@umahsn Could you please help me with this? Thanks
Hi,
Currently it is not possible to use unmapped BAM file or use move tables from BAM file, but given that this is going to be the default format for Guppy/Dorado, we will release an update soon which will allow unmapped BAM files.
With regards to the last error, it seems like the there is CpG site located near the end of the read and it is causing an out of index error because DeepMod2 requires a window of +-10bp around the CpG site. However, this should not be a problem typically because DeepMod2 checks and ignores such cases. In this case, it seems like the check is failing, which could be because it is getting a different size of FASTQ record from FAST5 vs BAM.
Does your FAST5 file have multiple basecall groups? If so, please use the same basecall group when running DeepMod2 using --guppy_group
parameter. This should be the same basecall group that was produced by Guppy 6.3.8 in your case.
Hi @umahsn,
Thank you for your reply. Yes, it is possible for the fast5 file to contain more reads. The guppy basecaller created a pass and a fail folder. The previous command was executed with reads from only the pass folder. I have modified it to now include reads both from the pass and from the fail folder.
But I keep getting the same error.
Also, I am not sure what you mean by "basecall group produced by Guppy 6.3.8". I tried to call deepmod2 with --guppy_group pass
but that didn't work.
Thank you.
Hi,
Sorry I meant if the FAST5 files have more than one basecalled data for each read. Can you run h5dump -n sample.fast5| head -50
to check what the basecall groups are? I suspect you have Basecall_1D_000
group from basecalling via MinKnow and another Basecall_1D_001
from basecalling via Guppy 6.3.8 (assuming you did not clear old basecalling data in between).
If you have both Basecall_1D_000 and Basecall_1D_001, then run DeepMod2 with --guppy_group Basecall_1D_001
parameter.
Hi @sagnikbanerjee15,
Thank you for your reply. I checked the fast5 file using h5dump
command and I found that only Basecall_1D_000
was there. I executed deepmod2 with the group but I ended up getting the same error.
I merged the fastqs from both pass and fail.
Thank you.
Can you try running DeepMod2 in which you replace line 194here in src/guppy.py with the following:
try:
base_seq=[base_map[fq[x]] for x in range(read_pos-window, read_pos+window+1)]
except IndexError:
print(read_pos, seq_len, len(fq), sequence_length)
sys.exit()
And let me know what you get?
Thank you
Thanks for your reply.
Here is the output 674 800 676 676
Thank you
Ok, it seems like read length estimate from the move table is 800, whereas the FASTQ record is 676. These two should be the same. A quick fix is to use FASTQ length record, which I will do for now.
Hi, I have fixed the bug. Please let me know if it works now. Also, there are new Guppy r9.4.1 models added so you can check them out as well: guppy_R9.4.1
is recommended, and guppy_hg1_R9.4
is the updated version of the previous model.
Hi @umahsn,
Thank you for the fast response. I can confirm that it is currently working and generating the expected results. I was able to run it without the group parameter as well.
Please consider updating deepmod2 to use outputs from the recent version of guppy.
Thank you.
Hi @umahsn,
Would it be possible to give me an approximate timeline when deepmod2 will be updated to work with unaligned bam files with move tables?
Thank you.
Hi,
We are aiming to have a release to address that by the end of this month.
Hello @umahsn,
Has the new update been released yet? If not, do you have a new date for release?
Thank you
Hi,
I am currently working on the BAM move table and have implemented it locally, and it works fine, but I am still testing it and need to extend it to allow aligned BAM files as well and use alignment info for reference anchoring. Sorry for the delay, we will have a new release by the middle of May.
No worries. I was just curious about the timeline. Also, please instruct me on which models to use for modified base-calling.
Thank you.
Hello, Do you have any updates about this request?
Thank you.
Hi,
We have released support for aligned and unaligned BAM files with move tables as wells as for POD5. Please check this document for more details: https://github.com/WGLab/DeepMod2/blob/main/docs/Example.md
Hello,
I am trying to perform 5mc base calls with fast5 files. I merged all the bam files that guppy reported using
samtools merge
and provided the merged bam file as input to deepmod2. I keep getting the following error.The command I executed is
I also tried converting that fast5 files to tombo fast5 but the tombo process failed with the error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Could you please look into this?
Thank you.