CCBR / Pipeliner

An open-source and scalable solution to NGS analysis powered by the NIH's Biowulf cluster.
4 stars 0 forks source link

Python Package Import Incompatibility: pysam #374

Closed skchronicles closed 5 years ago

skchronicles commented 5 years ago

Secondary Issue related to HPC Snakemake Upgrade #371

Dry-running the second half of the ChIP-seq pipeline causes a python ImportError. This error is related to recent changes HPC has made to the python/3.5 module on Biowulf (see issue #371 for more information).

The import error is due to the version of python that is bundled with snakemake (see: /usr/local/Anaconda/envs_app/snakemake/5.1.3/bin/). This version of python only includes the standard libraries that come standard with python 3.6, and does not include pysam.

skchronicles commented 5 years ago

Update

After closer inspection, it looks that pysam is not actively being used. The function call to normalize_bam_file_chromosomes, which was using pysam, has been commented out. For the time being, I will move the import statement inside of the function normalize_bam_file_chromosomes.

def normalize_bam_file_chromosomes(bamfns, obamfns=[], suffix='.common_chrom.bam'):
    from pysam import Samfile, FastaFile
    counts = []
    for bamfn1 in bamfns :
        bam1 = Samfile(bamfn1)

        cnt1 = Counter()
        for aread1 in bam1 :
            if aread1.is_unmapped :
                continue
            cnt1[aread1.reference_name] += 1 
        bam1.close()
        counts.append(cnt1)

    common = None
    for cnt in counts :
        if common != None :
            common = common.intersection(cnt)
        else :
            common = set(cnt)

Changes will be made in the following locations:

  1. module load ccbrpipeliner/3.0
  2. master branch
  3. activeDev branch

Permanent solution:

Remove the snakemake run for the rule PePr, and make a script. This will allow us to submit a job with the exact version of python that should be used.

skchronicles commented 5 years ago

Resolved