Closed dputhier closed 6 years ago
I pushed a fix for this issue. Should work now.
OK. Last tests were OK. There are still some issues. For instance, using RAB13:
$ gtftk get_example -d mini_real | gtftk select_by_key -k gene_name -v RAB13 | gtftk select_by_key --select-transcripts | gtftk 5p_3p_coord -t transcript | bedtools sort
chr1 153985361 153985362 ENSG00000143545|ENST00000484297 0 -
chr1 153985365 153985366 ENSG00000143545|ENST00000614713 0 -
chr1 153986245 153986246 ENSG00000143545|ENST00000462680 0 -
chr1 153986351 153986352 ENSG00000143545|ENST00000368575 0 -
chr1 153986357 153986358 ENSG00000143545|ENST00000495720 0 -
Unfortunatly, calling select_most_5p_tx, we don't get the most 5'.
$ gtftk get_example -d mini_real | gtftk select_most_5p_tx | gtftk select_by_key -k gene_name -v RAB13 | gtftk select_by_key --select-transcripts | gtftk 5p_3p_coord -t transcript | bedtools sort
chr1 153985365 153985366 ENSG00000143545|ENST00000614713 0 -
The same problem is encountered on CRABP2.
Maybe it is not strand specific ?
Also, what is the rule in case of Ties ?
Hi Fa, Did you have any time to have a look to this one ? Best
Hello, I'm back to GTFTK. I'll try to look at that this afternoon.
I suppressed select_transcript2 in the last push. It works now on my conda environment. Can you try it ?
I have to replace calls to select_transcript2 by calls to select_transcript ?
Yes !
I tried to include the last libgtftk version but I got the following message:
gtftk get_example -d mini_real | gtftk feature_size -t mature_rna
Exception AttributeError: AttributeError("function/symbol 'free_gtf_data' not found in library '/usr/local/lib/libgtftk.so': dlsym(0x7fddfd115170, free_gtf_data): symbol not found",) in <object repr() failed> ignored
Traceback (most recent call last):
File "/Users/puthier/miniconda2/envs/pygtftk/bin/gtftk", line 4, in <module>
__import__('pkg_resources').run_script('pygtftk==0.8.0.dev0+9422', 'gtftk')
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pkg_resources/__init__.py", line 657, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1437, in run_script
exec(code, namespace, namespace)
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/EGG-INFO/scripts/gtftk", line 77, in <module>
args = main()
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/EGG-INFO/scripts/gtftk", line 63, in main
CmdManager.run(args)
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/pygtftk/cmd_manager.py", line 897, in run
fun(**args)
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/pygtftk/plugins/feat_size.py", line 99, in feature_size
gtf = GTF(inputfile)
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/pygtftk/gtf_interface.py", line 442, in __init__
for i in tab:
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/pygtftk/gtf_interface.py", line 1096, in extract_data_iter_list
ptr = self._dll.extract_data(self._data, keys_csv, base, nr)
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/cffi/api.py", line 882, in __getattr__
make_accessor(name)
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/cffi/api.py", line 878, in make_accessor
accessors[name](name)
File "/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/cffi/api.py", line 808, in accessor_function
value = backendlib.load_function(BType, name)
AttributeError: function/symbol 'extract_data' not found in library '/usr/local/lib/libgtftk.so': dlsym(0x7fb78693d350, extract_data): symbol not found
Exception AttributeError: AttributeError("function/symbol 'free_gtf_data' not found in library '/usr/local/lib/libgtftk.so': dlsym(0x7fb78693d350, free_gtf_data): symbol not found",) in <bound method GTF.__del__ of
- A GTF object containing 0 lines:
- File connection: -
I deleted this library (/usr/local/lib/libgtftk.so) which was, I guess, a relic from an old installation (I'm under a conda env). Maybe it was just indicating that something went wrong upon compilation. Now the message is:
gtftk -h
|--- 17:26-WARNING : Failed to load plugin :/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/pygtftk/plugins/5p_3p_coords.py
cannot load library '/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/pygtftk/lib/libgtftk.so': dlopen(/Users/puthier/miniconda2/envs/pygtftk/lib/python2.7/site-packages/pygtftk-0.8.0.dev0+9422-py2.7-macosx-10.9-x86_64.egg/pygtftk/lib/libgtftk.so, 2): Symbol not found: _get_blast_header
This seem to be related to this new function related to blast parsing.
I don't have this problem on blackflag in a conda environment.
Can you clone a new repo and test ?
Works fine. But when I run "gtftk -h", I have this message :
Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.
I just have to reinstall matplotlib (conda install matplotlib) to remove the message.
I will have a look for matplotlib (now required as R dependencies have been replaced by Python code). Concerning "select_transcript" update, can you go to "feature_libgtftk_h80" branch. It contains the last update of libgtftk. Best
I just did that and it still works fine ! "gtftk get_example -d mini_real | gtftk feature_size -t mature_rna" runs like a charm !
Hum... I clone it too and it seems ~ OK. The problem is that several test are not successfuly executed. I have to check.
Ah, OK. Now I get some additional lines with "?". Nice. I will have to deal with it.
Yes a lot of test fail because of "?". Especially, you have to update "tabulate".
The command
gtftk get_example -d mini_real | gtftk select_most_5p_tx | gtftk select_by_key -k gene_name -v RAB13 | gtftk select_by_key --select-transcripts | gtftk 5p_3p_coord -t transcript | bedtools sort
you put at the beginning of this issue doesn't work because "gtftk get_example -d mini_real" give a GTF without gene lines. "select_most_5p_tx" function needs gene lines to perform. So if you add a "gtftk convert_ensembl" before calling "gtftk select_most_5p_tx", that works fine.
This one is fixed in cd2bda89901d8d7de926148c6df8260bc7640393
Hi, Working a little bit on improving tests I realized that select_trancript with argument 3 is not selecting transcript with the most 5' TSS.
Without call to select_most_5p_tx. Selecting gene ISG15 we obtain 3 transcripts.
Now we also ask for the most 5':
Unfortunately, on forward strand, we expect ENST00000624697.