Closed marrip closed 3 years ago
I think this may be because I messed up my branches by accident when doing some recent work to update that particular module. So you are basically using a half-deployed version of that code. I am planned to do a merge (very) soon, but in the meantime you can try pulling the sv_caller branch and see if it fixes the problem.
This is also directly related to issue #3 and will hopefully be corrected moving forward.
Haha, you are little too fast. The branch disappeared and now I see you have a version tag. Is v1.1.0 the one you would recommend?
Yeah, realized after I posted that message the development on that branch was pretty much done other than bug fixes...
So I merged it into master and generated a new release. Try v1.1.0 for now and let me know if you run into any bugs. I think v.1.0.0 may be a little more stable (and reflects the release at the time of the medrxiv preprint) so you may choose to revert to that if necessary.
I'll leave this open to track the original issue if necessary...
Hey again,
I tried v1.1.0 running database
and encountered this one:
...
bwa ran with code: 0
Traceback (most recent call last):
File "/indelible/indelible.py", line 146, in <module>
indelible.build_database(args.fof, args.output_path, args.reference_path, config, args.priors, args.bwa_thread)
File "/indelible/indelible/build_database.py", line 227, in build_database
v = add_alignment_information(v, decisions, repeat_info)
File "/indelible/indelible/build_database.py", line 73, in add_alignment_information
if int(other_coord[1]) < curr_bp["pos"]:
ValueError: invalid literal for int() with base 10: 'KN707671v1'
Seems like indelible tries to read some string as an integer which cannot be transformed. I did not reproduce the output from the previous steps. Should I have done that? I will try v1.0
and let you know if I get past this point βΊοΈ
Update: With v1.0
the task completes without any errors and completes also much faster than with v1.1.0
. Could you tell me why?
Running v1.0
, I tried the annotate
and got this error:
Traceback (most recent call last):
File "/indelible/indelible.py", line 164, in <module>
indelible.annotate(args.input_path, args.output_path, args.database, config)
File "/indelible/indelible/annotate.py", line 378, in annotate
bhash = create_blast_hash(input_path)
File "/indelible/indelible/annotate.py", line 295, in create_blast_hash
for row in csv.DictReader(open(blast_nonrepeats_path, 'r'), delimiter="\t"):
FileNotFoundError: [Errno 2] No such file or directory: 'indelible/duplicates_marked.bam.counts.scored.fasta.hits_nonrepeats'
Indeed this file doesn't exist - which task does produce this?
I think you may be using the README for v1.1.0 to run v1.0.0. There is an additional module that has been removed ("Blast") whose functionality is now within "Database" in v1.1.0.
As for the issue in v1.1.0 β are you just running one sample? If so, could you share your output for the score command (something like *.bam.counts.scored
if run similar to the README)?
Hey,
I did indeed use the README for v1.1.0 - thought it could be rather similar π
. But now I took a look and tried to run the blast
command, however, I need a windowmasker db and I had some troubles running windowmasker
myself due to some dependencies. I am using the reference genome GRCh38. I will try to fix it next week and see if I can get further with the analysis.
Regarding sharing the files, I am a bit hesitant as this is actual patient data. I took a look at the .scored
file though and found that the string KN707671v1
stems from the chromosome ID chrUn_KN707671v1_decoy
. Should these be excluded from the reference before analysis?
Yeah, I would probably try to stick with the current version so that it is easier for me to support. windowmasker
is annoying to run, and I couldn't include it in the repo due to it's size...
As for sharing, it doesn't necessarily have to be public here on github. The file should be reasonably small and if you strip patient identifiers from it we can arrange a secure share using open ssl, something like:
echo "PASSWORD" | openssl enc -md MD5 -pass stdin -aes-256-cbc -salt -in sample.scored > sample.enc
Obviously don't use "PASSWORD"... But you can send to be with a private message/email (in the readme).
Up to you but would help me to debug the issue. The chromosome column should not get fed into that particular line in the code, so there is likely a more complicated formatting thing going on here that is going to be difficult for me to diagnose without the original file. I think even a slice (like 10 lines above and below) would be sufficient.
Hey, I created a slice including header and additional lines before and after the offending line. Should I send it to your mail address?
Yup!
I think I've figured out the issue. These "chromosomes" contain "_" characters in their names and I code some of the data with "_" as delimiters which I split on (don't know why I realized just now that this is a bad idea.....).
I have pushed a fix live, so pull the latest master branch and let me know if it fixes your issues. I will incorporate this fix into a later full release at some point.
Thanks for your patience!
Hey Eugene,
now I was able to run all the steps until annotate
. Then I got this error:
Traceback (most recent call last):
File "/indelible/indelible.py", line 155, in <module>
indelible.annotate(args.input_path, args.output_path, args.database, config)
File "/indelible/indelible/annotate.py", line 224, in annotate
v = attach_db(v, db)
File "/indelible/indelible/annotate.py", line 119, in attach_db
db_entry = db[key]
KeyError: 'chr1:104160'
I saw that you changed the separators for chromosomes and positions so I guess it has something to do with that?
I think this is a separate issue. I was able to run InDelible without issue using my own tests. I think this has to do with the 'chr' leader on the entry. I have stripped it pretty much everywhere else, but I seem to have missed it here.
I have pushed a bug fix β go ahead and test it for me. Again, sorry for the issues, but these issues seem to only show up when other people start using it...
There seems to be a problem still:
Traceback (most recent call last):
File "/indelible/indelible.py", line 155, in <module>
indelible.annotate(args.input_path, args.output_path, args.database, config)
File "/indelible/indelible/annotate.py", line 224, in annotate
v = attach_db(v, db)
File "/indelible/indelible/annotate.py", line 119, in attach_db
db_entry = db[key]
KeyError: '1:104160'
I have run InDelible on my own Hg38 test data and cannot replicate your issue. Can you share your database and annotate command?
Are you potentially using the "database" file included with InDelible rather than the one that database creates?
Also, can you check if that the entry "1:104160" actually exists in your database file that you generated with 'database'?
I think I do π I run this:
indelible.py annotate --i sample.bam.counts.scored --o sample.bam.counts.scored.annotated --c config.yml --d Indelible_db_10k.hg38.bed
and Indelible_db_10k.hg38.bed
is from the the included zip
file. Indeed, the position exists in my tsv
database file. I think I got confused by the format difference tsv
vs. bed
. Is there some kind of shortcut to generate a bed
from the database?
I think based on your experience I need to make the README more clear and make sure file names are consistent (file suffixes don't matter here...). There are two options:
indelible.py database --f fofn.txt --o db.txt ... <other options>
indelible.py annotate --i file.scored --d db.txt ... <other options>
indelible.py database --f fofn.txt --o db.txt --priors Indelible_db_10k.hg38.bed ... <other options>
indelible.py annotate --i file.scored --d db.txt ... <other options>
Hope that clears things up? I have fixed the README to hopefully be more accurate of the above workflow.
Much better indeed π I will try if I get it right and let you know. Thanks for being patient and responding quickly.
Have a great weekend.
No worries. These are mistakes I've made so thanks for catching them for me. Let me know how it goes and I will close this issue.
Hey Eugene,
every step ran smoothly now and I get a tsv
in the end, now I am trying to do the complete
run. I was also wondering about a vcf
as output, is there any method to produce that?
Closing this issue since the original issue seems to have been fixed.
See #6 for new responses.
hey,
I am running indelible on some patient data and could run the steps
fetch
,aggregate
andscore
without any problems. However, now I am runningdatabase
which runs and logs some values until it throws this error:I am not completely sure what is causing the error but I saw that in
bwa_runner.py
is doing some calculations which might cause the error. Any kind of feedback would be highly appreciated π