deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

AttributeError: 'csamtools.AlignedRead' object has no attribute 'has_tag' #42

Closed aj03 closed 7 years ago

aj03 commented 8 years ago

I am facing this problem when running HiCExplorer software:

command: hicBuildMatrix -s mapping/SRR1956527_1.bam mapping/SRR1956527_2.bam -rs dpnII_positions_GRCm38.bed -seq GATC -b hiCmatrix/SRR1956527_ref.bam -o hiCmatrix/SRR1956527.matrix

reading mapping/SRR1956527_1.bam and mapping/SRR1956527_2.bam to build hic_matrix Minimum distance considered between restriction sites is 300 Max distance: 800 Matrix size: 2666241 dangling sequences to check are {'pat_forw': 'ATC', 'pat_rev': 'GAT'} Traceback (most recent call last): File "/usr/local/bin/hicBuildMatrix", line 7, in main() File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicBuildMatrix.py", line 644, in main mate1_supplementary_list = get_supplementary_alignment(mate1, str1) File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicBuildMatrix.py", line 410, in get_supplementary_alignment if read.has_tag('SA'): AttributeError: 'csamtools.AlignedRead' object has no attribute 'has_tag'

pysam version: Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. import pkg_resources pkg_resources.get_distribution("pysam").version '0.9.1.4'

Can you please help me to solve it out. Thanks in advance

fidelram commented 8 years ago

I have tested HiCExplorer with pysam 0.8.3

I will check what is the problem with pysam 0.9.1.4

On Thu, Nov 10, 2016 at 5:05 AM, aj03 notifications@github.com wrote:

I am facing this problem when running HiCExplorer software:

command: hicBuildMatrix -s mapping/SRR1956527_1.bam mapping/SRR1956527_2.bam -rs dpnII_positions_GRCm38.bed -seq GATC -b hiCmatrix/SRR1956527_ref.bam -o hiCmatrix/SRR1956527.matrix

reading mapping/SRR1956527_1.bam and mapping/SRR1956527_2.bam to build hic_matrix Minimum distance considered between restriction sites is 300 Max distance: 800 Matrix size: 2666241 dangling sequences to check are {'pat_forw': 'ATC', 'pat_rev': 'GAT'} Traceback (most recent call last): File "/usr/local/bin/hicBuildMatrix", line 7, in main() File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicBuildMatrix.py", line 644, in main mate1_supplementary_list = get_supplementary_alignment(mate1, str1) File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicBuildMatrix.py", line 410, in get_supplementary_alignment if read.has_tag('SA'): AttributeError: 'csamtools.AlignedRead' object has no attribute 'has_tag'

pysam version: Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. import pkg_resources pkg_resources.get_distribution("pysam").version '0.9.1.4'

Can you please help me to solve it out. Thanks in advance

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maxplanck-ie/HiCExplorer/issues/42, or mute the thread https://github.com/notifications/unsubscribe-auth/AEu_1Q2IFpgnSwkH04FDaZgtqyoKh70Rks5q8pf8gaJpZM4KuQoj .

Fidel Ramirez

fidelram commented 8 years ago

I just checked and the has_tag attribute was added in pysam version 0.8.2

I tested the code with pysam 0.9.1.4 and didn't have any trouble. I added commit 9ad2a3efabb6207991d283f08b2edb9f163b9336 to test for the pysam version. This will inform you the pysam version being used by the code.

I would guess that in your case you have more than one pysam version installed and hicexplorer is using and olded one.

Try to run the code as /path/to/python hicBuildMatrix -s .....

aj03 commented 8 years ago

Thanks fidelram.. It worked :)

Actually the problem was that I had more than one pysam version installed (0.9.1.4 and 0.6) and hicexplorer was using and olded one (0.6).... hicBuildMatrix -s ../mapping/SRR1956527_1.bam ../mapping/SRR1956527_2.bam -rs ../dpnII_positions_GRCm38.bed -seq GATC -b SRR1956527_ref.bam -o SRR1956527.matrix ERROR Version of pysam has to be higher than 0.8.3. Current installed version is 0.6

The has_tag pysam error is resolved now.

Thanks once again :)

fidelram commented 8 years ago

I think that this function was added to scipy recently. I am using scipy 0.17. Can you update your scipy version?

Meanwhile I will update the versions in the setup.py and add further version tests to avoid this issue in the future.

On Thu, Nov 10, 2016 at 12:04 PM, aj03 notifications@github.com wrote:

Thanks fidelram.. It worked :)

Actually the problem was that I had more than one pysam version installed (0.9.1.4 and 0.6) and hicexplorer was using and olded one (0.6).... hicBuildMatrix -s ../mapping/SRR1956527_1.bam ../mapping/SRR1956527_2.bam -rs ../dpnII_positions_GRCm38.bed -seq GATC -b SRR1956527_ref.bam -o SRR1956527.matrix ERROR Version of pysam has to be higher than 0.8.3. Current installed version is 0.6

The has_tag pysam error is resolved now but its showing new error now:

hicBuildMatrix -s ../mapping/SRR1956527_1.bam ../mapping/SRR1956527_2.bam -rs ../dpnII_positions_GRCm38.bed -seq GATC -b SRR1956527_ref.bam -o SRR1956527.matrix

reading ../mapping/SRR1956527_1.bam and ../mapping/SRR1956527_2.bam to build hic_matrix Minimum distance considered between restriction sites is 300 Max distance: 800 Matrix size: 2666241 dangling sequences to check are {'pat_forw': 'ATC', 'pat_rev': 'GAT'} processing 1000000 lines took 26.08 secs (38347.3 lines per second) 244810 (24.48%) valid pairs added to matrix processing 2000000 lines took 52.42 secs (38154.5 lines per second) 481367 (24.07%) valid pairs added to matrix processing 3000000 lines took 79.80 secs (37592.9 lines per second) 712799 (23.76%) valid pairs added to matrix processing 4000000 lines took 104.52 secs (38270.7 lines per second) 942385 (23.56%) valid pairs added to matrix processing 5000000 lines took 129.21 secs (38695.9 lines per second) 1176656 (23.53%) valid pairs added to matrix processing 6000000 lines took 155.77 secs (38517.8 lines per second) 1410029 (23.50%) valid pairs added to matrix processing 7000000 lines took 180.14 secs (38858.2 lines per second) 1645092 (23.50%) valid pairs added to matrix processing 8000000 lines took 204.88 secs (39047.4 lines per second) 1889359 (23.62%) valid pairs added to matrix processing 9000000 lines took 230.03 secs (39125.0 lines per second) 2132262 (23.69%) valid pairs added to matrix processing 10000000 lines took 254.96 secs (39221.8 lines per second) 2373752 (23.74%) valid pairs added to matrix Traceback (most recent call last): File "/usr/local/bin/hicBuildMatrix", line 7, in main() File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicBuildMatrix.py", line 881, in main hic_matrix += coo_matrix((data, (row, col)), shape=(matrix_size, matrix_size)) File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 387, in iadd raise NotImplementedError NotImplementedError

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maxplanck-ie/HiCExplorer/issues/42#issuecomment-259662587, or mute the thread https://github.com/notifications/unsubscribe-auth/AEu_1ZzbdQpt4HEmBz7nkXoM-zHaSQB-ks5q8voigaJpZM4KuQoj .

Fidel Ramirez

fidelram commented 8 years ago

@aj03 Did it work after updating scipy?

aj03 commented 8 years ago

Hi..I updated scipy version and even setup.py but ended up with this error:

processing 47000000 lines took 1442.25 secs (32587.9 lines per second) 10874271 (23.14%) valid pairs added to matrix processing 48000000 lines took 1471.86 secs (32611.8 lines per second) 11105341 (23.14%) valid pairs added to matrix Traceback (most recent call last): File "/usr/local/bin/hicBuildMatrix", line 5, in pkg_resources.run_script('HiCExplorer==1.3', 'hicBuildMatrix') File "/usr/local/lib/python2.7/dist-packages/distribute-0.6.15-py2.7.egg/pkg_resources.py", line 467, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/local/lib/python2.7/dist-packages/distribute-0.6.15-py2.7.egg/pkg_resources.py", line 1200, in run_script execfile(script_filename, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/HiCExplorer-1.3-py2.7.egg/EGG-INFO/scripts/hicBuildMatrix", line 7, in main() File "/usr/local/lib/python2.7/dist-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/hicBuildMatrix.py", line 690, in main mate2.pos): File "/usr/local/lib/python2.7/dist-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/hicBuildMatrix.py", line 64, in is_duplicated self.pos_matrix[pos1, pos2] = 1 File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/dok.py", line 248, in setitem dict.setitem(self, (int(i), int(j)), v[()]) MemoryError

I tried to solve it out but...wasn't able to..

scipy version:

pkg_resources.get_distribution("scipy").version '0.18.1'

fidelram commented 8 years ago

I am afraid that you may be running out of memory. Processing of Hi-C data unfortunately requires quite some memory and a 64-bit system.

The particular part that is failing for you is the detection of duplicated reads.

I could think of several ways to save memory but none seems optimal:

How much memory do you have? do you have a 64-bit system?

aj03 commented 8 years ago

yes OS is 64bit and memory is 19.5GB. disk is 309.5GB my working directory has 134G available space

fidelram commented 8 years ago

Seems like enough memory, although we normally work with 300 Gb.

I will add a branch with the option to skip the check for duplicated reads, but you may want to run hicBuildMatrix with the option --doTestRun that will give you a glimpse of the duplication rate. This option only considers 1 million reads, makes a matrix but importantly, reports the QC values.

fidelram commented 8 years ago

I added the branch: https://github.com/maxplanck-ie/HiCExplorer/tree/skip_duplication_check

you need to run hicBuildMatrix with the option --skipDuplicationCheck

Recently, I added snakemake rules to download and process Hi-C data. This workflow may be helpful in your case (but add the --skipDuplicationCheck to the rules). You can find the rules in the /scripts folder (https://github.com/maxplanck-ie/HiCExplorer/tree/master/scripts).