cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
52 stars 7 forks source link

Issue running mira.tl.get_motif_hits_in_peaks #39

Closed jmmuncie closed 3 months ago

jmmuncie commented 6 months ago

I am encountering an issue while trying to annotate motifs in my peaks using get_motif_hits_in_peaks. See below for the log information and error message. The same error occurs when I try to run this on the provided sample dataset. It seems to be breaking at the very end of the motif scanning, but frustratingly is not providing a useful error message.

I confirmed that my peaks.var information is properly formatted with 'chr', 'start', and 'end' columns (see attached screenshot).

I am running MIRA 1.0.4 and moods 1.9.4.1.

Any advice you have for troubleshooting this would be greatly appreciated!

I'm running:

#Now annotate motifs
mira.tools.motif_scan.logger.setLevel(logging.INFO) # make sure progress messages are displayed

mira.tl.get_motif_hits_in_peaks(peaks_cp,
                    genome_fasta='genome_mm10/mm10.fa',
                    chrom = 'chr', start = 'start', end = 'end',
                    pvalue_threshold=0.0001) # indicate chrom, start, end of peaks

Logging and error:

INFO:mira.tools.motif_scan:Getting peak sequences ...
194046it [00:11, 16537.13it/s]
INFO:mira.tools.motif_scan:Scanning peaks for motif hits with p >= 0.0001 ...
INFO:mira.tools.motif_scan:Building motif background models ...
INFO:mira.tools.motif_scan:Starting scan ...
INFO:mira.tools.motif_scan:Found 1000000 motif hits ...
INFO:mira.tools.motif_scan:Found 2000000 motif hits ...
INFO:mira.tools.motif_scan:Found 3000000 motif hits ...
INFO:mira.tools.motif_scan:Found 4000000 motif hits ...
INFO:mira.tools.motif_scan:Found 5000000 motif hits ...
INFO:mira.tools.motif_scan:Found 6000000 motif hits ...
INFO:mira.tools.motif_scan:Found 7000000 motif hits ...
INFO:mira.tools.motif_scan:Found 8000000 motif hits ...
INFO:mira.tools.motif_scan:Found 9000000 motif hits ...
INFO:mira.tools.motif_scan:Found 10000000 motif hits ...
INFO:mira.tools.motif_scan:Found 11000000 motif hits ...
INFO:mira.tools.motif_scan:Found 12000000 motif hits ...
INFO:mira.tools.motif_scan:Found 13000000 motif hits ...
INFO:mira.tools.motif_scan:Found 14000000 motif hits ...
INFO:mira.tools.motif_scan:Found 15000000 motif hits ...
INFO:mira.tools.motif_scan:Found 16000000 motif hits ...
INFO:mira.tools.motif_scan:Found 17000000 motif hits ...
INFO:mira.tools.motif_scan:Found 18000000 motif hits ...
INFO:mira.tools.motif_scan:Found 19000000 motif hits ...
INFO:mira.tools.motif_scan:Found 20000000 motif hits ...
INFO:mira.tools.motif_scan:Found 21000000 motif hits ...
INFO:mira.tools.motif_scan:Found 22000000 motif hits ...
INFO:mira.tools.motif_scan:Found 23000000 motif hits ...
INFO:mira.tools.motif_scan:Found 24000000 motif hits ...
INFO:mira.tools.motif_scan:Found 25000000 motif hits ...
INFO:mira.tools.motif_scan:Found 26000000 motif hits ...
INFO:mira.tools.motif_scan:Found 27000000 motif hits ...
INFO:mira.tools.motif_scan:Found 28000000 motif hits ...
INFO:mira.tools.motif_scan:Found 29000000 motif hits ...
INFO:mira.tools.motif_scan:Found 30000000 motif hits ...
INFO:mira.tools.motif_scan:Found 31000000 motif hits ...
INFO:mira.tools.motif_scan:Found 32000000 motif hits ...
INFO:mira.tools.motif_scan:Found 33000000 motif hits ...
INFO:mira.tools.motif_scan:Found 34000000 motif hits ...
INFO:mira.tools.motif_scan:Found 35000000 motif hits ...
INFO:mira.tools.motif_scan:Found 36000000 motif hits ...
INFO:mira.tools.motif_scan:Found 37000000 motif hits ...
INFO:mira.tools.motif_scan:Found 38000000 motif hits ...
INFO:mira.tools.motif_scan:Found 39000000 motif hits ...
INFO:mira.tools.motif_scan:Found 40000000 motif hits ...
INFO:mira.tools.motif_scan:Found 41000000 motif hits ...
INFO:mira.tools.motif_scan:Found 42000000 motif hits ...
INFO:mira.tools.motif_scan:Found 43000000 motif hits ...
INFO:mira.tools.motif_scan:Found 44000000 motif hits ...
INFO:mira.tools.motif_scan:Found 45000000 motif hits ...
INFO:mira.tools.motif_scan:Found 46000000 motif hits ...
INFO:mira.tools.motif_scan:Found 47000000 motif hits ...
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
/scratch/jmuncie/ipykernel_3775275/180803331.py in <module>
      9                     genome_fasta='genome_mm10/mm10.fa',
     10                     chrom = 'chr', start = 'start', end = 'end',
---> 11                     pvalue_threshold=0.0001) # indicate chrom, start, end of peaks

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/adata_interface/core.py in _run(adata, *args, **kwargs)
     70             #print(fetch(None, adata, **getter_kwargs))
     71 
---> 72             output = func(**fetch(None, adata, **getter_kwargs), **function_kwargs)
     73             #print(output, adata, adder_kwargs)
     74             return add(adata, output, **adder_kwargs)

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/tools/motif_scan.py in get_motif_hits_in_peaks(peaks, pvalue_threshold, genome_fasta)
    253         get_peak_sequences(peaks, genome_fasta, temp_fasta_name)
    254 
--> 255         hits_matrix = get_motif_hits(temp_fasta_name, len(peaks), pvalue_threshold = pvalue_threshold)
    256 
    257         ids, factors = list(zip(*list_motif_ids()))

~/.conda/envs/mira-env/lib/python3.7/site-packages/mira/tools/motif_scan.py in get_motif_hits(peak_sequences_file, num_peaks, pvalue_threshold)
    169 
    170     if not process.poll() == 0:
--> 171         raise Exception('Error while scanning for motifs: ' + process.stderr.read().decode())
    172 
    173     logger.info('Formatting hits matrix ...')

Exception: Error while scanning for motifs: 
peaks var
AllenWLynch commented 5 months ago

Yes, it appears MOODS is not outputting anything to stderr that can help us out. Can you confirm that all of your peaks intersect with the mm10 genome? Also, another test may be to divide your peaks into chromosomes or some other chunks to see if any of those chunks are specifically causing the error.

AL

jmmuncie commented 5 months ago

Hi Allen,

Thanks for your reply! I was running these analyses on a high performance computing cluster (UCSF Wynton HPC). I tried transferring the data and models data back to my local machine and when I run it that way, it worked. In both cases I'm using identical conda environments. I know it's not really possible for you to fully troubleshoot environment specific issues, but any quick thoughts on what might be happening? I've been able to run all other MIRA analyses on the Wynton cluster and it would definitely be easier to not have to move data back and forth.

AllenWLynch commented 5 months ago

One thought I had is that MOODS may not be building correctly on the HPC cluster. Without accessing the environment it is indeed very challenging to figure out what is happening. One option is you could try a minimal example with MOODS to see if anything will run. There's some example data in the MOODS repo that I have used before: https://github.com/jhkorhonen/MOODS.git

jmmuncie commented 3 months ago

Hi Allen,

Thanks for your thoughts on this. I suspect you may be correct that MOODS is not building correctly on the HPC, or not set up correctly within my environment. These analyses have moved to the back-burner for me for now, so I haven't had a chance to troubleshoot any more. I'll mark this as closed for now and circle back if I discover a fix later that might be helpful to other users that run into similar issues.