Closed MaestSi closed 5 years ago
Hey @MaestSi, can you post the values of $INDEX and $FAST5? STRique is not complaining if it does not find fast5 files and the index file stores only relative paths. Pay
Sure, here they are:
FAST5=/home/simone/MinION/Cas9/20190328_1110_MDCas9 INDEX=/home/simone/MinION/Cas9/MD_Cas9.fofn
Simone
Okay I guess the issue is the index then: You ran the indexing for the folder
/home/simone/MinION/Cas9/20190328_1110_MDCas9
therefore you would need to store your index as e.g.
/home/simone/MinION/Cas9/20190328_1110_MDCas9/MD_Cas9.fofn
Alternatively you can use a (not yet documented) feature and run:
$STRIQUE index --out_prefix 20190328_1110_MDCas9 --recursive $FAST5 > $INDEX
To explain this: The index.fofn contains relative paths to the fast5 files, starting from the directory you provide in the indexing step. If not saved to the correct location, the paths are not valid and no reads are found resulting in no output. If this causes similar problems in the future, I'll implement some checking and warning message if the index seems to be corrupted.
Pay
I changed INDEX to:
INDEX=/home/simone/MinION/Cas9/20190328_1110_MDCas9/MDCas9.fofn
but still got no output. Should it be correct? As an example, the first line of INDEX file is:
fast5/20/Marta_PC_20190328_FAJ01316_MN20885_sequencing_run_MDCas9_85850_read_45042_ch_418_strand2.fast5 f99a3273-d83e-452d-8570-982b910b32a0
Looks okay to me.
ls home/simone/MinION/Cas9/20190328_1110_MDCas9/fast5/20/Marta_PC_20190328_FAJ01316_MN20885_sequencing_run_MDCas9_85850_read_45042_ch_418_strand2.fast5
should list you the fast5 file.
Could you please check, that alignment and repeat config are on the same coordinates (hg19/hg38,...). STRique is scanning the alingment cigar and only starting a detection, if alignment (+ soft-clipping) spans any of the configured target regions.
I ran the test/example commands from the README, giving me the expected output line. Saying if index, configuration and input alignment match, you should definitely see some output.
Yes, the above command lists the fast5 file. All genomic coordinates are hg38-based. I tried again giving slightly different prefix and suffix sequences in the config file, but still didn't succeed. May lower-case letters in the reference (and not in the repeat, prefix and suffix sequences listed in the config file) be an issue?
Lower-case in reference doesn't matter, the prefix/suffix sequences in the config should be upper case. You would also get an error otherwise. I set up a better logging and pushed a new version to the master branch, could you please update using:
cd STRique
git pull
python3 setup.py install
and re-run the pipeline as above but with --log_level debug
samtools view $BAM | $STRIQUE count $INDEX $STRIQUE_ROOT/models/template_median68pA6mer.model $REPEAT --t 30 --log_level debug > STRique_output.txt
and post the output here. This will also print target names, feel free to mask confidential information in there!
A different error occurred this time. This is a part of the error message printed to stdout.
09.04.2019 13:29:42 [PID 33196] [INFO] Logger created. 09.04.2019 13:29:42 [PID 33196] [DEBUG] Main: Parsed config. 09.04.2019 13:29:43 [PID 33235] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads. Traceback (most recent call last):
File "/home/simone/MinION/software/STRique/scripts/STRique.py", line 682, in worker input = worker_callable(**input)
File "/home/simone/MinION/software/STRique/scripts/STRique.py", line 611, in detect self.__init_hmm__()
File "/home/simone/MinION/software/STRique/scripts/STRique.py", line 572, in __init_hmm__ self.repeatCounter.add_target(target_name, repeat, prefix, suffix)
File "/home/simone/MinION/software/STRique/scripts/STRique.py", line 499, in add_target flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "/home/simone/MinION/software/STRique/scripts/STRique.py", line 397, in init self.__build_model__()
File "/home/simone/MinION/software/STRique/scripts/STRique.py", line 426, in __build_model__ self.bake(merge='All')
File "pomegranate/hmm.pyx", line 765, in pomegranate.hmm.HiddenMarkovModel.bake
TypeError: 'method' object is not iterable
The output file is empty a part from the header as before.
Okay, so building the model fails. Can you confirm, that not only the script STRique_test.py passes but also the following lines from the README?
python3 scripts/STRique.py index data/ > data/reads.fofn
cat data/c9orf72.sam | python3 scripts/STRique.py count ./data/reads.fofn ./models/template_median68pA6mer.model ./configs/repeat_config.tsv --config ./configs/STRique.json
If they give the expected output, you installation in general works and there must be something wrong with the repeat config file. Can you share the config, if you want send it per mail to giesselmann[at]molgen.mpg.de
The second command gives error:
[INFO] not in [<log_type.Error: '[ERROR]'>, <log_type.Warning: '[WARNING]'>] [DEBUG] not in [<log_type.Error: '[ERROR]'>, <log_type.Warning: '[WARNING]'>] ID target strand count score_prefix score_suffix log_p offset ticks [DEBUG] not in [<log_type.Error: '[ERROR]'>, <log_type.Warning: '[WARNING]'>] 09.04.2019 15:35:11 [PID 33810] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads. Traceback (most recent call last):
File "scripts/STRique.py", line 682, in worker input = worker_callable(**input)
File "scripts/STRique.py", line 611, in detect self.__init_hmm__()
File "scripts/STRique.py", line 572, in __init_hmm__ self.repeatCounter.add_target(target_name, repeat, prefix, suffix)
File "scripts/STRique.py", line 499, in add_target flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "scripts/STRique.py", line 397, in init self.__build_model__()
File "scripts/STRique.py", line 426, in __build_model__ self.bake(merge='All')
File "pomegranate/hmm.pyx", line 765, in pomegranate.hmm.HiddenMarkovModel.bake
TypeError: 'method' object is not iterable
[DEBUG] not in [<log_type.Error: '[ERROR]'>, <log_type.Warning: '[WARNING]'>] [DEBUG] not in [<log_type.Error: '[ERROR]'>, <log_type.Warning: '[WARNING]'>]
Sure, I am going to send you my config file. Thanks, Simone
No need to do so, it looks like pomegranate itself is causing the error. Can you please install STRique in a separate virtual environment. The problem is, that we had to freeze the pomegranate and networkx package versions in a previous release. Alternatively if you have the chance you may want to use the Docker container as it is stand-alone.
I was already running STRique in a separate conda virtual environment. Could you please conda list pomegranate and networkx versions?
they are in the requirements.txt, with
pip install -r requirements.txt --upgrade
you force an update
After repeating the installation today, STRique is now working. STRique output now contains as many rows as the number of reads mapped to the target region. I guess I should identify 2 clusters and average out the repeat counts in each cluster. Are there any utilities within the STRique framework to do that, or are there any future plans for that? Thanks, Simone
I'm happy to hear the pipeline is working. The interpretation is not part of STRique but filtering on prefix/suffix alignment scores might be a good first step to obtain reliable results. We will continue working on the accuracy of the tool. I can't say anything about extending it by further downstream analysis as these are very application specific. Pay
Dear STRique developers, I tried out your software, but got empty output a part from the header. I created a config file with the sequences of interest and mapped my MinION reads to human reference genome hg38 with minimap2. This is the command I used for running STRique:
And this is the output I got:
The target region I am interested in has coverage depth of about 13X. Is it enough? Is STRique able to differentiate between the two haplotypes? All tests I performed with
python3 scripts/STRique_test.py
gave PASS results. When looking in IGV at reads mapped to the target region I can roughly count the number of triplets. Do you have any suggestions? Thanks in advance