Error: returns "hg38 not found" when the folder has hg38 file

UcarLab / PEAS

Code repository for PEAS (Predict Enhancers from ATAC-seq), including feature extraction files and easy to use python script for training enhancer models and predicting enhancers using MLP Neural Networks.

MIT License

9 stars 2 forks source link

Error: returns "hg38 not found" when the folder has hg38 file #5

Open Ysuita opened 3 years ago

Ysuita commented 3 years ago

I received the error below when I tried to run feature extract.

"--- Filtering peaks. --- --- Calling annotations & known motifs. ---

!!!!Genome hg38 not found in /gpfs/runtime/opt/homer/4.10/.//config.txt"

I found out that config.txt exists in the following directory "/gpfs/runtime/opt/homer/4.10/config.txt". How could I change directory for configuration?

GUI has configuration only for PEAS software, but not others. I tried running mine on terminal by copying and paste commands for feature extracts, but these commands seem not to have a directory for this config.txt.

Also, these commands seem not to have been able to access to other fa file that I put, because of error in directory.

It would be great if you could direct me with how to change directory for configuration! Thanks for the help.

ajt986 commented 3 years ago

Hello,

I would double check to make sure hg38 has been installed for HOMER: perl /path-to-homer/configureHomer.pl -install hg19 http://homer.ucsd.edu/homer/introduction/install.html

If it is installed and still giving an error, you can try to modify the shell script: PEASFeatureExtraction.sh

Specifically, you'll want to modify these lines:

annotatePeaks.pl "${prefix}_peaks.filtered" "${homerref}" -m "${homermotifs}" -nmotifs > ${prefix}_peaks_annotated.bed

findMotifsGenome.pl "${prefix}_peaks.filtered" "${fasta}" "${outDir}/denovo"

annotatePeaks.pl "${prefix}_peaks.filtered" "${homerref}" -m "${outDir}/denovo/merge/merged.motifs" -nmotifs > ${prefix}_peaks_denovo.bed

annotatePeaks.pl "${prefix}_peaks.filtered" "${homerref}" -m "${ctcfmotifs}" -nmotifs > ${prefix}_peaks_ctcf.bed

Here, ${fasta} referes to the directory of the full genome fasta file, and ${homerref} refers to the genome build (i.e., hg38). You can update these HOMER commands to include other configurations.

Best, Asa

Ysuita commented 3 years ago

Thanks for the answer, Asa! I managed to extract PEAS features, and now I'm actually trying to predict promoters and enhancers by using singularity. When I run /PEAS/singularity/PEASFeatureExtraction-singularity.sh , I got an error saying "/PEAS/singularity/PEASPrediction-singularity.sh: line 50: outdir/TMP_PEAS_hg38_FILELIST.txt: No such file or directory". Then, when I checked the output directory, no files were existed. What could perform to create this MP_PEAS_hg38_FILELIST.txt? Thanks so much for the help!

ajt986 commented 3 years ago

It looks like it just can't find the path to the output directory, which looks it like it's "outdir" based on the error message. You'll need to provide the full path to an existing directory for this so that TMP_PEAS_hg38_FILELIST.txt can be created to run the prediction. Note this file will be removed. What this is doing is formatting the simpler input for the generic file list input that the python script uses.

Ysuita commented 3 years ago

Thanks for the response! I solved this issue and was able to identify enhancers!

I have other questions - is it possible to apply PEAS to rat cells? Does it have rg6 as a reference genome? Have you tested it before?

ajt986 commented 3 years ago

Glad to hear! We haven't tested PEAS on rg6. There will be issues with the model due differences in known motifs that the model uses. However, the tools are there to train a new model if rg6 reference data (i.e., ATAC-seq and enhancer annotation data) is available with appropriate class labels.

Ysuita commented 3 years ago

Hi, I got the appropriate ATAC-seq bam files and enhancer annotation data (H3K27ac ChIP-seq bed file)! To my understanding, training a new model, first, requires feature extraction from the new ATAC-seq and enhancer annotation data. But, how do I pass this enhancer annotation data to feature extraction? I see that PEASFeatureExtraction.sh would take 8 arguments, but none of them sounds enhancer annotation file. it would be great if you could guide me with this. Thank you very much for the help!

ajt986 commented 3 years ago

After extracting the features you need to append a column containing the enhancer information. You'll also need to create your own classes.txt file which specifies which column corresponds to specific class labels. It also functions as a way to convert labels to integer values. The classes.txt file consists of 3 tab delimited columns for each class annotation/conversion corresponding to the column position (starting from 0) in the file, the class label, and the integer class label i.e., (0, 1)

Example: 27 Enhancer 1 27 Non-enhancer 0

The first column indicates that the 28th column holds the annotation information and to convert the label Enhancer to 1 and Non-enhancer to 0.