cgat-developers / cgat-apps

cgat-apps repository
Other
33 stars 14 forks source link

Bigwig files in bam2geneprofile #36

Closed idyonchev closed 4 years ago

idyonchev commented 5 years ago

If I try to do a metagene profile over a Bigwig file using, for example:

cgat bam2geneprofile --bigwigfile=bigwig.bw --gtf-file=geneset.filtered.gtf.gz --reporter=gene                         --method=intervalprofile  --normalize-transcript=total-sum  --normalize-profile=area  

I get the error:

Traceback (most recent call last):
  File "/shared/sudlab1/General/CGAT-FLOW/cgat-flow/conda-install/envs/cgat-flow/bin/cgat", line 11, in <module>
    load_entry_point('cgat', 'console_scripts', 'cgat')()
  File "/shared/sudlab1/General/CGAT-FLOW/cgat-flow/cgat-apps/cgat/cgat.py", line 132, in main
    module.main(sys.argv)
  File "/shared/sudlab1/General/projects/CGAT-FLOW/cgat-flow/cgat-apps/cgat/tools/bam2geneprofile.py", line 679, in main
    wigfiles = [BigWigFile(x) for x in options.infiles]
  File "/shared/sudlab1/General/CGAT-FLOW/cgat-flow/cgat-apps/cgat/tools/bam2geneprofile.py", line 679, in <listcomp>
    wigfiles = [BigWigFile(file=open(x)) for x in options.infiles]
NameError: name 'BigWigFile' is not defined

This is because BigWigFile is never imported. In fact, there is no BigWigFile class in pyBigWig (which is what I am assuming you are using from looking at the conda environment).

If I add import pyBigWig to the imports and change line 679 to

wigfiles = [pyBigWig.open(x) for x in options.infiles]

which is the correct way to open a bigwig in pyBigWig, then I get the error:

Traceback (most recent call last):
  File "/shared/sudlab1/General/CGAT-FLOW/cgat-flow/conda-install/envs/cgat-flow/bin/cgat", line 11, in <module>
    load_entry_point('cgat', 'console_scripts', 'cgat')()
  File "/shared/sudlab1/General/CGAT-FLOW/cgat-flow/cgat-apps/cgat/cgat.py", line 132, in main
    module.main(sys.argv)
  File "/shared/sudlab1/General/CGAT-FLOW/cgat-flow/cgat-apps/cgat/tools/bam2geneprofile.py", line 817, in main
    gtf_iterator)
  File "cgat/BamTools/geneprofile.pyx", line 1751, in cgat.BamTools.geneprofile.countFromGTF
  File "cgat/BamTools/geneprofile.pyx", line 685, in cgat.BamTools.geneprofile.IntervalsCounter.update
  File "cgat/BamTools/geneprofile.pyx", line 1610, in cgat.BamTools.geneprofile.RegionCounter.count
  File "cgat/BamTools/geneprofile.pyx", line 1046, in cgat.BamTools.geneprofile.GeneCounter.count
  File "cgat/BamTools/geneprofile.pyx", line 63, in cgat.BamTools.geneprofile.RangeCounter.getCounts
  File "cgat/BamTools/geneprofile.pyx", line 473, in cgat.BamTools.geneprofile.RangeCounterBigWig.count
AttributeError: 'pyBigWig.bigWigFile' object has no attribute 'get'

I'm not sure where to go from here?

Acribbs commented 5 years ago

Thats strange why this hasnt been picked up previously. After a quick look there seems to be a test for this. I will look into seeing if the test is wrongly implemented.

Modifying code to use pyBigWig was correct, although I cant tell if BigWigFiles is a cgat or pyBigWig function.

It seems that the offending line is this: https://github.com/cgat-developers/cgat-apps/blob/ac7885fc30ee488fe060a473402886b086518a3d/cgat/BamTools/geneprofile.pyx#L473

looking at the pyBigWig manual, to extract values your require the .get(contig, start, end) to be modified to .values(contig, start, end). Could you change this and let me know if it worked?

Since this is a pyx file, you may need to run python setup.py develop in the cgat-apps folder and then python cgat/tools/cgat_rebuild_extensions.py.

I dont have test data to try at the moment but I will create some and get back to you if it doesn't work.

IanSudbery commented 4 years ago

I've been working on this. PR incomming.

BTW python cgat/tools/cgat_rebuild_extensions.py no longer works as it assumes naming and location convensions that no longer hold. Is there a best practice replacement?

Acribbs commented 4 years ago

Great.

Ah no I don't think there is a replacement, the locations and namings should be updated in cgat/tools/cgat_rebuild_extensions.py. I know there was some significant rearrangement of the repo a while ago but I wasn't expecting problems with regenerating cython extensions

Acribbs commented 4 years ago

Indeed, you're correct. Are you planning on updating this cgat/tools/cgat_rebuild_extensions.py? Otherwise im happy to have a look at it.

IanSudbery commented 4 years ago

I don't actaully know how - the extensions that need rebuilding are all over the place now, and cgat_rebuild_extensions relies on the extension having the same name as the script calling it, which is no longer the case.

So for for me just doing setup.py develop has been sufficient, but I don't know if there are times this is not the case, or would be undesirable.

Acribbs commented 4 years ago

Iv just seen that in setup.py there is a cython build_ext import https://github.com/cgat-developers/cgat-apps/blob/2696e526ff07897594f6c978086999053d97e8e9/setup.py#L57 that should cythonize the repo when running setup.py develop so the cgat/tools/cgat_rebuild_extensions.py shouldn't be necessary.