colomemaria / epiScanpy

Episcanpy: Epigenomics Single Cell Analysis in Python
BSD 3-Clause "New" or "Revised" License
139 stars 33 forks source link

build_mtx_fly tabix value error #56

Closed le-ander closed 4 years ago

le-ander commented 4 years ago

I'm hitting another issue with build_mtx_fly() after overcoming the import issue. i have built windows using the following function: windows = epi.ct.make_windows(10000)

Building the feature matrix based on these windows unfortunately fails:

loading the barcodes
building the count matrix

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-4ae2376dfb31> in <module>
----> 1 epi.ct.bld_mtx_fly(paths['rawdir']+'fragments.tsv.gz', paths['rawdir']+'fragments.tsv.gz.tbi', windows)

~/.local/lib/python3.7/site-packages/episcanpy/count_matrix/_bld_atac_mtx.py in bld_mtx_fly(tsv_file, tbi_file, annotation, csv_file, genome, DATADIR, save)
    156     for tmp_feat in window_list:
    157         vector = [0]*nb_barcodes
--> 158         for row in tbx.fetch(tmp_feat[0], tmp_feat[1], tmp_feat[2], parser=pysam.asTuple()):
    159         #for row in tbx.fetch(start=tmp_feat[0], end=tmp_feat[1], region=tmp_feat[2], parser=pysam.asTuple()):
    160             line = str(row).split('\t')[-2]

pysam/libctabix.pyx in pysam.libctabix.TabixFile.fetch()

ValueError: could not create iterator for region '1:1-10000'
DaneseAnna commented 4 years ago

Hi,

what genome(s) are you using for the assembly ? I added a quick fix for a 1 genome assembly. You can try adding the genome as an argument in epi.ct.bld_mtx_fly

Like such: epi.ct.bld_mtx_fly('atac_v1_hgmm_500_fragments.tsv.gz', 'atac_v1_hgmm_500_fragments.tsv.gz.tbi', windows, genome='hg19')

And install episcanpy again using the GitHub version

le-ander commented 4 years ago

This is the information I have on my sample:

"reference_annotation": "gencode.v28.basic",
"reference_annotation_gtf_url": "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.basic.annotation.gtf.gz",
"reference_assembly": "GRCh38",
"reference_assembly_accession": "GCA_000001405.15",
"reference_assembly_fasta_url": "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz",
"reference_organism": "Homo_sapiens",
"reference_version": "1.1",

The top of my fragments.tsv looks like this:

chr1    10084   10326   GCACCTTAGCTTACCA-1      1
chr1    10091   10210   GCGCCAATCTTTCGAT-1      1
chr1    10101   10283   CGTTCCACATTTGGCA-1      1
chr1    10138   10197   ACAGGCCTCCAACCTC-1      1
chr1    10144   10190   TCAGTTTCATTGAACC-1      1
chr1    10150   10198   AAATGAGAGAACTCCT-1      1
chr1    10150   10203   GTAGACTAGCAACGGT-1      1
chr1    10157   10192   TAGCCCTGTGATAGAT-1      3
chr1    10157   10198   TCAGTTTCATTGAACC-1      1
chr1    10272   10543   GCAGCTGGTAAGGTCG-1      1

Looks like it's working now if I specify genome=None. Nice!