Closed qferre closed 5 years ago
Opening the file in r+ mode instead of r fixes the problem, but is potentially error-prone : there is no reason those files should be edited.
arg_formatter.FormattedFile(mode='r+', file_ext='bed')
This bypasses the part of arg_formatter that calls make_tmp_file (only called when mode == 'r'). This could be the reason.
Clearly we should not open it in 'r+'.
Are you able to read this file simply using ?
file_bo = BedTool(string)
After investigation, it seems to fail precisely at line 482 in arg_formatter.py. Every line before works, every line after does not (as ascertained by painstakingly adding print('Everything up to her works')
to test :)
I believe it is because it is trying to set the field name, which is non-existent as BedTools read the file as a bed3. I'll keep investigating.
Yes.... The code here is buggy.
for record in file_bo:
if field_count < 4:
record.name = 'region_' + str(region_nb)
fields = record.fields[0:3]
fields += [record.name,
record.score,
record.strand]
tmp_file.write("\t".join(fields))
Should be replaced by something like:
for record in file_bo:
if field_count < 4:
name = 'region_' + str(region_nb)
fields = record.fields[0:3]
fields += [name
'0',
'.']
tmp_file.write("\t".join(fields))
The question is also what will happen with unstranded features...
pybedtools is known to throw segfaults when iterating over BedFile objects in certain conditions :https://github.com/daler/pybedtools/issues/82
I had already tried to fix it by doing pretty much the same modification as the one you posted in the comment above (also adding record.strand and record.score, to no effect) , but I was looking for a cleaner solution.
Think about doing this modification in the develop branch.
Le jeu. 10 janv. 2019 à 15:40, Quentin Ferré notifications@github.com a écrit :
pybedtools is known to throw segfaults when iterating over BedFile objects in certain conditions :daler/pybedtools#82 https://github.com/daler/pybedtools/issues/82
I had already fixed it by doing pretty much the same modification as the one you posted in the comment above, but I was looking for a cleaner solution.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dputhier/pygtftk/issues/47#issuecomment-453118372, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvxHpQRhMKFZNERF96ehZwt1mKO4XO4ks5vB1DxgaJpZM4Z5i-2 .
Denis Puthier laboratoire INSERM TAGC/INSERM U 1090 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: denis.puthier@univ-amu.fr Tel: (National) 04 91 82 87 31 / (International) 33 4 91 82 87 31 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01
Web:
http://tagc.univ-mrs.fr/tagc/index.php/research/network-bioinformatics/dputhier
====================================================================
Using a fix practically identical to yours, there is no segfault, but the error is now : "gtftk peak_anno: error: argument -p/--peak-file: invalid FormattedFile('r') value: 'bed3_h3k4me3.bed'" ?
Did you try:
type=arg_formatter.FormattedFile(mode='r', file_ext='bed')
This is already what is in the arg_parser of peak anno.
And it's still working with a bed6 ???
Yep.
Could you try to change the file name to toto.bed just in case there would be something weird with the regexp...
Still the same error.
There is also something that seems to be related to pybedtools version. In my hands, I am able, using a bed3 file, to write something like:
import pybedtools
pybedtools.__version__ # '0.8.0'
from pybedtools import BedTool
a = BedTool("test.bed")
for i in a:
pass
i.name # '.'
i.name = 'bla'
In this version, the name/score/strand attributes are set by default to '.'. So we should let the code unchanged and ensure during installation that the pybedtools version is at least '0.8.0'.
That will probably be easier and save us a lot of headaches :) I'll revert all modifications on arg_formatter on my part (I had not commited them anyways)
What is you version ?
Hmm... the version in my virtual environment for pygtftk was the 0.8.0 too...
Ask for pybedtools.file to check...
Le jeu. 10 janv. 2019 à 16:21, Quentin Ferré notifications@github.com a écrit :
Hmm... the version in my virtual environment for pygtftk was the 0.8.0 too...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dputhier/pygtftk/issues/47#issuecomment-453132969, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvxHgxuZA3FJ3gw0QnvBalY7HOLuGT6ks5vB1pbgaJpZM4Z5i-2 .
Denis Puthier laboratoire INSERM TAGC/INSERM U 1090 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: denis.puthier@univ-amu.fr Tel: (National) 04 91 82 87 31 / (International) 33 4 91 82 87 31 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01
Web:
http://tagc.univ-mrs.fr/tagc/index.php/research/network-bioinformatics/dputhier
====================================================================
Yeah that's what I did.
Fixed it. The problem was simply to remember to add the new line character at the end of each line in the temp file while it's being generated.
I have also implemented the fix for the buggy code, discussed above.
The fix is on the peak_anno_shuffling branch, should I upload it to the develop branch as well ?
Clarification : the new line character fixed the second problem (not the segfault).
The segfault was fixed by not trying to write to a non-existent record.name
, as pybedtools can have problem with certain operations when they are done inside iterators.
Update : the fix is now part of the develop branch.
Using peak_anno with a bed3 file triggers a segfault. No other messages are displayed, even with high verbosity.
Command example :
The command works fine the exact same file is converted to a bed6 with filler characters (eg. 'chr1 100 200' becomes 'chr1 100 200 A B C')
I suspect this is related to the argument formatter and could appear in other parts of the pygtftk project.