Closed loganminhdang closed 2 months ago
I'm not too sure myself on this error. I think using fakeroot (--fakeroot/-f option) might help with this when building the sandbox.
An alternative solution is available if you have a local system where you do have sudo permissions. You can build the sandbox directory, install homer, convert it back into a .sif file and then copy the .sif with the HOMER references onto the cluster.
Thanks for the advice! Using --fakeroot did not help my case, but I'm setting Core-ATAC to run on my local machine. However, I have encountered some setbacks, which I hope to be able to resolve with your support. Thank you.
chr1 3012670 3012826 peak_0 1000 . 314.525085 4.641444 -1 78 chr1 3042895 3043287 peak_1 1000 . 746.472717 6.212195 -1 346 chr1 3094459 3095530 peak_2 1000 . 3474.600342 10.301152 -1 778 .....
Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.jax.bamextractor.FileMap.getFileMap(FileMap.java:17) at org.jax.bamextractor.ReadExtractor.main(ReadExtractor.java:47) ... 5 more
mkdir: cannot create directory ‘peak_features’: File exists --- Calling annotations & known motifs. ---
Peak file = /home/quangdang/FeaturesSCNSD_BLfiltered_peaks.txt_original.txt
Genome = mm10
Organism = mouse
Motif files:
/CoRE-ATAC//PEAS/humantop_Nov2016_HOMER.motifs -m
Will report the number of motifs in each peak
Peak/BED file conversion summary:
BED/Header formatted lines: 0
peakfile formatted lines: 0
Duplicated Peak IDs: 0
Peak File Statistics:
Total Peaks: 0
Redundant Peak IDs: 0
Peaks lacking information: 0 (need at least 5 columns per peak)
Peaks with misformatted coordinates: 0 (should be integer)
Peaks with misformatted strand: 0 (should be either +/- or 0/1)
Only 0 peaks found! If many more expected, maybe the file format is for Macs only
...... ...... --- Getting insert size threshold. --- Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61) Caused by: htsjdk.samtools.util.RuntimeIOException: java.io.FileNotFoundException: SCNSD_BLfiltered_sorted.bam (No such file or directory) at htsjdk.samtools.FileInputResource$1.get(SamInputResource.java:256) at htsjdk.samtools.FileInputResource$1.get(SamInputResource.java:250) at htsjdk.samtools.util.Lazy.get(Lazy.java:25) at htsjdk.samtools.FileInputResource.asUnbufferedSeekableStream(SamInputResource.java:291) at htsjdk.samtools.FileInputResource.asUnbufferedInputStream(SamInputResource.java:299) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:367) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:208) at org.jax.peastools.insertmetrics.PeakThreshold.getThreshold(PeakThreshold.java:80) at org.jax.peastools.insertmetrics.PeakThreshold.doIt(PeakThreshold.java:45) at org.jax.peastools.insertmetrics.PeakThreshold.main(PeakThreshold.java:37) at org.jax.peastools.Main.main(Main.java:36) ... 5 more
As you can see, there was some issues with reading the peak files. Any guidance on fixing this error will be greatly appreciated!
For HOMER motifs, it looks like this was mistakenly hardcoded. In FeatureExtractor.sh, you'll need to replace the known motifs with the location of your motif files. HOMERMOTIFS=${path}PEAS/humantop_Nov2016_HOMER.motifs
just change this line to something like HOMERMOTIFS="/your/path/to/KnownMouseMotifs.motifs"
. This will follow HOMER inputs. You may need to bind the path to your new motif file when running singularity using the -B option, or you can add the motif file within the singularity sandbox directory itself.
The additional columns shouldn't cause an issue as long as the first 3 are chromosome, start, and end. The input expects a tab delimited peak file though. If the peak file is not tab delimited then the chromosome is read as the entire line. Since the entire line doesn't match a chromosome in the chromosome list, the peak list become empty causing the errors you see. I think if you just reformat your peaks to be tab delimited, it should work.
Thanks for your advice. My .narrowPeak files are definitely tab-delimited, so I'm trying to figure out why the peak files are not being read. I modified the peak files, retaining only the first three column, but it still doesn't seem work.
Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.jax.bamextractor.FileMap.getFileMap(FileMap.java:17) at org.jax.bamextractor.ReadExtractor.main(ReadExtractor.java:47) ... 5 more mkdir: cannot create directory 'peak_features': File exists --- Calling annotations & known motifs. ---
Peak file = /path/to/Core_ATAC/CoRE-ATAC-FeaturePredictor-mm10/Feature_extraction/CTXSD_BLfiltered_peaks.txt_original.txt
Genome = mm10
Organism = mouse
Motif files:
/CoRE-ATAC//PEAS/humantop_Nov2016_HOMER.motifs -m
Will report the number of motifs in each peak
Peak/BED file conversion summary:
BED/Header formatted lines: 0
peakfile formatted lines: 0
Duplicated Peak IDs: 0
Peak File Statistics:
Total Peaks: 0
Redundant Peak IDs: 0
Peaks lacking information: 0 (need at least 5 columns per peak)
Peaks with misformatted coordinates: 0 (should be integer)
Peaks with misformatted strand: 0 (should be either +/- or 0/1)
My command is as below:
singularity exec -B /my/home/path/ /path/to/CoRE-ATAC-FeaturePredictor-mm10/ /CoRE-ATAC/CoRE-ATACFeatureExtraction-singularity.sh /my/home/path/.bam /path/to/Core_ATAC/CoRE-ATAC-FeaturePredictor-mm10/.bed mm10 /path/to/mm10.fa /path/to/CoRE-ATAC-FeaturePredictor-mm10/Chromosome/list.txt /path/to/CoRE-ATAC-FeaturePredictor-mm10/Feature_extraction/
I used the -B option to bind my home directory since initially, the paths to my input bam and bed files could not be found: "no such file or directory." Maybe this is where the error arises - the input files cannot be located for some reason. The same issue persists on both my local machine and a cluster. I moved my input files inside the sandbox directory but that doesn't seem to solve the issue.
I'm quite stuck and would very much appreciate you providing some guidance. Thank you for your time
The first exception tells me that the file that lists the different chromosome FASTA files isn't being read correctly. This file is tab delimited and should look like this:
chr1 /projects/chromosomes/chr1.fa.gz
chr2 /projects/chromosomes/chr2.fa.gz
chr3 /projects/chromosomes/chr3.fa.gz
It still looks like the script isn't finding these files though. This file for the chromosomes is especially important because the peaks are checked against this file and any peak with a chromosome not matching this file will be excluded. So if this file is not being read correctly, it will consequently result in an empty peak file as well.
My suggestion is to enter the singularity image as a shell using singularity shell, and try to see if you can access those files while in the shell using commands like ls
. If you find the paths that works, using those paths should get the feature extraction running assuming that the peak file etc are read properly in the right format.
Hi Asa,
Sorry for digging up this conversation again after a long while. I have some recent data that motivates a re-analysis with the Core-ATAC model. In the FeatureExtractor.sh script, I saw a BAMExtractror.jar java script being mentioned, but it was not included in the GitHub repository. Can you provide that script?
My thanks in advance, in case you are still maintaining this project.
No problem! Here is the jar file for the BAMExtractor tool used.
@ajt986 Thank you so much! If you don't mind, are you able to share the input files you used in your study (e.g., original and reformatted peak files, chromosome list files, etc.)
When running FeatureExtractor.sh, I still struggle with the same error:
Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.jax.bamextractor.FileMap.getFileMap(FileMap.java:17) at org.jax.bamextractor.ReadExtractor.main(ReadExtractor.java:47)
When I tried running PeakFormatter.py alone, my peak files could not be recognized, resulting in an empty output. My peak files are output from MACS2 peak-calling, so I'm unsure why the peak formatting step didn't work. I retried the BamExtractor.jar step using 500-bp fixed-width peaks that I generated, that also didn't work.
If you could provide the input files that you used, I would be grateful. Thank you!
HEPG2_shifted_peaks.filtered.zip Hello,
Here's an input peak file that I used. See if you get formatted peak results with this file using the peakformatter script. If it works then the next step is to see what's different in the formatting of your file. If it doesn't, need to figure out what's going wrong with the formatting script.
Best, Asa
Thank you for your support, Asa! I finally managed to finish the pipeline and was able to generate predictions, which seems quite sensible.
I do have one question as a follow-up. My ATAC-seq data is derived from mouse and whenever I tried to use a mouse-specific motif file for the feature extraction step (by editing the line: HOMERMOTIFS=${path}PEAS/humantop_Nov2016_HOMER.motifs with a mouse-specific motif file), I encountered an error that doesn't happen if I used the original human motif file:
ValueError: Input 0 of layer peas_dense is incompatible with the layer: expected axis -1 of input shape to have value 30 but received input with shape [None, 29].
The full error message is below:
WARNING:tensorflow:Model was constructed with shape (None, 30, 1) for input Tensor("peas_input:0", shape=(None, 30, 1), dtype=float32), but it was called on an input with incompatible shape (None, 29, 1). Traceback (most recent call last): File "/CoRE-ATAC/CoREATAC_PredictionTool.py", line 82, in
ValueError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1147 predict_function *
outputs = self.distribute_strategy.run(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run **
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1122 predict_step **
return self(x, training=False)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:927 __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py:719 call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py:888 _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:886 __call__
self.name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/input_spec.py:216 assert_input_compatibility
' but received input with shape ' + str(shape))
ValueError: Input 0 of layer peas_dense is incompatible with the layer: expected axis -1 of input shape to have value 30 but received input with shape [None, 29].
Do you have any idea what could be the issue? Thank you very much in advance.
Are you changing the HOMER reference to mouse (e..g., MM10) and was this reference added to the HOMER installation of the singularity container?
The issue was indeed related (The motif file needs to be updated in the Singularity container). Thanks a lot for your help Asa!
Hey, thank you for creating this useful tool! I'm a beginner in using Singularity and am having some issues creating a writable Singularity container. When I try to enter the sandbox directory using the --writable option, I received the following error:
WARNING: By using --writable, Singularity can't create /users destination automatically without overlay or underlay FATAL: container creation failed: mount /var/singularity/mnt/session/users->/users error: while mounting /var/singularity/mnt/session/users: destination /users doesn't exist in container
I can only enter the directory by eliminating the --writable option. As a result, upon trying to install Homer, as the directory is read-only, newly installed files cannot be created. ..... checkdir error: cannot create /HOMER/data/genomes Read-only file system unable to process data/genomes/mm10//chrUn_JH584304.fa. checkdir error: cannot create /HOMER/data/genomes Read-only file system unable to process data/genomes/mm10//chr4.fa. checkdir error: cannot create /HOMER/data/genomes Read-only file system unable to process data/genomes/mm10//chrUn_GL456392.fa. ....
I tried using sudo when creating the sandbox but for my case, that doesn't work (I'm using computer nodes on a cluster). Presumably, due to this error, the location of the feature extraction singularity script could not be detected when I tried to perform extraction. I would appreciate any guidance, thank you.