UcarLab / CoRE-ATAC

MIT License
10 stars 1 forks source link

Singularity container is not writable #6

Closed loganminhdang closed 2 months ago

loganminhdang commented 1 year ago

Hey, thank you for creating this useful tool! I'm a beginner in using Singularity and am having some issues creating a writable Singularity container. When I try to enter the sandbox directory using the --writable option, I received the following error:

WARNING: By using --writable, Singularity can't create /users destination automatically without overlay or underlay FATAL: container creation failed: mount /var/singularity/mnt/session/users->/users error: while mounting /var/singularity/mnt/session/users: destination /users doesn't exist in container

I can only enter the directory by eliminating the --writable option. As a result, upon trying to install Homer, as the directory is read-only, newly installed files cannot be created. ..... checkdir error: cannot create /HOMER/data/genomes Read-only file system unable to process data/genomes/mm10//chrUn_JH584304.fa. checkdir error: cannot create /HOMER/data/genomes Read-only file system unable to process data/genomes/mm10//chr4.fa. checkdir error: cannot create /HOMER/data/genomes Read-only file system unable to process data/genomes/mm10//chrUn_GL456392.fa. ....

I tried using sudo when creating the sandbox but for my case, that doesn't work (I'm using computer nodes on a cluster). Presumably, due to this error, the location of the feature extraction singularity script could not be detected when I tried to perform extraction. I would appreciate any guidance, thank you.

ajt986 commented 1 year ago

I'm not too sure myself on this error. I think using fakeroot (--fakeroot/-f option) might help with this when building the sandbox.

An alternative solution is available if you have a local system where you do have sudo permissions. You can build the sandbox directory, install homer, convert it back into a .sif file and then copy the .sif with the HOMER references onto the cluster.

loganminhdang commented 1 year ago

Thanks for the advice! Using --fakeroot did not help my case, but I'm setting Core-ATAC to run on my local machine. However, I have encountered some setbacks, which I hope to be able to resolve with your support. Thank you.

chr1 3012670 3012826 peak_0 1000 . 314.525085 4.641444 -1 78 chr1 3042895 3043287 peak_1 1000 . 746.472717 6.212195 -1 346 chr1 3094459 3095530 peak_2 1000 . 3474.600342 10.301152 -1 778 .....

Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.jax.bamextractor.FileMap.getFileMap(FileMap.java:17) at org.jax.bamextractor.ReadExtractor.main(ReadExtractor.java:47) ... 5 more

mkdir: cannot create directory ‘peak_features’: File exists --- Calling annotations & known motifs. ---

    Peak file = /home/quangdang/FeaturesSCNSD_BLfiltered_peaks.txt_original.txt
    Genome = mm10
    Organism = mouse
    Motif files:
            /CoRE-ATAC//PEAS/humantop_Nov2016_HOMER.motifs  -m
    Will report the number of motifs in each peak
    Peak/BED file conversion summary:
            BED/Header formatted lines: 0
            peakfile formatted lines: 0
            Duplicated Peak IDs: 0

    Peak File Statistics:
            Total Peaks: 0
            Redundant Peak IDs: 0
            Peaks lacking information: 0 (need at least 5 columns per peak)
            Peaks with misformatted coordinates: 0 (should be integer)
            Peaks with misformatted strand: 0 (should be either +/- or 0/1)

    Only 0 peaks found! If many more expected, maybe the file format is for Macs only

...... ...... --- Getting insert size threshold. --- Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61) Caused by: htsjdk.samtools.util.RuntimeIOException: java.io.FileNotFoundException: SCNSD_BLfiltered_sorted.bam (No such file or directory) at htsjdk.samtools.FileInputResource$1.get(SamInputResource.java:256) at htsjdk.samtools.FileInputResource$1.get(SamInputResource.java:250) at htsjdk.samtools.util.Lazy.get(Lazy.java:25) at htsjdk.samtools.FileInputResource.asUnbufferedSeekableStream(SamInputResource.java:291) at htsjdk.samtools.FileInputResource.asUnbufferedInputStream(SamInputResource.java:299) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:367) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:208) at org.jax.peastools.insertmetrics.PeakThreshold.getThreshold(PeakThreshold.java:80) at org.jax.peastools.insertmetrics.PeakThreshold.doIt(PeakThreshold.java:45) at org.jax.peastools.insertmetrics.PeakThreshold.main(PeakThreshold.java:37) at org.jax.peastools.Main.main(Main.java:36) ... 5 more

As you can see, there was some issues with reading the peak files. Any guidance on fixing this error will be greatly appreciated!

ajt986 commented 1 year ago

For HOMER motifs, it looks like this was mistakenly hardcoded. In FeatureExtractor.sh, you'll need to replace the known motifs with the location of your motif files. HOMERMOTIFS=${path}PEAS/humantop_Nov2016_HOMER.motifs just change this line to something like HOMERMOTIFS="/your/path/to/KnownMouseMotifs.motifs". This will follow HOMER inputs. You may need to bind the path to your new motif file when running singularity using the -B option, or you can add the motif file within the singularity sandbox directory itself.

The additional columns shouldn't cause an issue as long as the first 3 are chromosome, start, and end. The input expects a tab delimited peak file though. If the peak file is not tab delimited then the chromosome is read as the entire line. Since the entire line doesn't match a chromosome in the chromosome list, the peak list become empty causing the errors you see. I think if you just reformat your peaks to be tab delimited, it should work.

loganminhdang commented 1 year ago

Thanks for your advice. My .narrowPeak files are definitely tab-delimited, so I'm trying to figure out why the peak files are not being read. I modified the peak files, retaining only the first three column, but it still doesn't seem work.

Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.jax.bamextractor.FileMap.getFileMap(FileMap.java:17) at org.jax.bamextractor.ReadExtractor.main(ReadExtractor.java:47) ... 5 more mkdir: cannot create directory 'peak_features': File exists --- Calling annotations & known motifs. ---

    Peak file = /path/to/Core_ATAC/CoRE-ATAC-FeaturePredictor-mm10/Feature_extraction/CTXSD_BLfiltered_peaks.txt_original.txt
    Genome = mm10
    Organism = mouse
    Motif files:
            /CoRE-ATAC//PEAS/humantop_Nov2016_HOMER.motifs  -m
    Will report the number of motifs in each peak
    Peak/BED file conversion summary:
            BED/Header formatted lines: 0
            peakfile formatted lines: 0
            Duplicated Peak IDs: 0

    Peak File Statistics:
            Total Peaks: 0
            Redundant Peak IDs: 0
            Peaks lacking information: 0 (need at least 5 columns per peak)
            Peaks with misformatted coordinates: 0 (should be integer)
            Peaks with misformatted strand: 0 (should be either +/- or 0/1)

My command is as below:

singularity exec -B /my/home/path/ /path/to/CoRE-ATAC-FeaturePredictor-mm10/ /CoRE-ATAC/CoRE-ATACFeatureExtraction-singularity.sh /my/home/path/.bam /path/to/Core_ATAC/CoRE-ATAC-FeaturePredictor-mm10/.bed mm10 /path/to/mm10.fa /path/to/CoRE-ATAC-FeaturePredictor-mm10/Chromosome/list.txt /path/to/CoRE-ATAC-FeaturePredictor-mm10/Feature_extraction/

I used the -B option to bind my home directory since initially, the paths to my input bam and bed files could not be found: "no such file or directory." Maybe this is where the error arises - the input files cannot be located for some reason. The same issue persists on both my local machine and a cluster. I moved my input files inside the sandbox directory but that doesn't seem to solve the issue.

I'm quite stuck and would very much appreciate you providing some guidance. Thank you for your time

ajt986 commented 1 year ago

The first exception tells me that the file that lists the different chromosome FASTA files isn't being read correctly. This file is tab delimited and should look like this:

chr1    /projects/chromosomes/chr1.fa.gz
chr2    /projects/chromosomes/chr2.fa.gz
chr3    /projects/chromosomes/chr3.fa.gz

It still looks like the script isn't finding these files though. This file for the chromosomes is especially important because the peaks are checked against this file and any peak with a chromosome not matching this file will be excluded. So if this file is not being read correctly, it will consequently result in an empty peak file as well.

My suggestion is to enter the singularity image as a shell using singularity shell, and try to see if you can access those files while in the shell using commands like ls. If you find the paths that works, using those paths should get the feature extraction running assuming that the peak file etc are read properly in the right format.

loganminhdang commented 2 months ago

Hi Asa,

Sorry for digging up this conversation again after a long while. I have some recent data that motivates a re-analysis with the Core-ATAC model. In the FeatureExtractor.sh script, I saw a BAMExtractror.jar java script being mentioned, but it was not included in the GitHub repository. Can you provide that script?

My thanks in advance, in case you are still maintaining this project.

ajt986 commented 2 months ago

BAMExtractor.jar.zip

No problem! Here is the jar file for the BAMExtractor tool used.

loganminhdang commented 2 months ago

@ajt986 Thank you so much! If you don't mind, are you able to share the input files you used in your study (e.g., original and reformatted peak files, chromosome list files, etc.)

When running FeatureExtractor.sh, I still struggle with the same error:

Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.jax.bamextractor.FileMap.getFileMap(FileMap.java:17) at org.jax.bamextractor.ReadExtractor.main(ReadExtractor.java:47)

When I tried running PeakFormatter.py alone, my peak files could not be recognized, resulting in an empty output. My peak files are output from MACS2 peak-calling, so I'm unsure why the peak formatting step didn't work. I retried the BamExtractor.jar step using 500-bp fixed-width peaks that I generated, that also didn't work.

If you could provide the input files that you used, I would be grateful. Thank you!

ajt986 commented 2 months ago

HEPG2_shifted_peaks.filtered.zip Hello,

Here's an input peak file that I used. See if you get formatted peak results with this file using the peakformatter script. If it works then the next step is to see what's different in the formatting of your file. If it doesn't, need to figure out what's going wrong with the formatting script.

Best, Asa

loganminhdang commented 2 months ago

Thank you for your support, Asa! I finally managed to finish the pipeline and was able to generate predictions, which seems quite sensible.

I do have one question as a follow-up. My ATAC-seq data is derived from mouse and whenever I tried to use a mouse-specific motif file for the feature extraction step (by editing the line: HOMERMOTIFS=${path}PEAS/humantop_Nov2016_HOMER.motifs with a mouse-specific motif file), I encountered an error that doesn't happen if I used the original human motif file:

ValueError: Input 0 of layer peas_dense is incompatible with the layer: expected axis -1 of input shape to have value 30 but received input with shape [None, 29].

The full error message is below:

WARNING:tensorflow:Model was constructed with shape (None, 30, 1) for input Tensor("peas_input:0", shape=(None, 30, 1), dtype=float32), but it was called on an input with incompatible shape (None, 29, 1). Traceback (most recent call last): File "/CoRE-ATAC/CoREATAC_PredictionTool.py", line 82, in predict(datadirectory, basename, model, outputfile, featurefile, labelencoder, swapchannels) File "/CoRE-ATAC/CoREATAC_PredictionTool.py", line 73, in predict sig_predictions, peas_predictions, predictions = model.predict([x_test_sigseq, x_test_peas]) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 88, in _method_wrapper return method(self, *args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1268, in predict tmp_batch_outputs = predict_function(iterator) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 580, in call result = self._call(*args, *kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 627, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 506, in _initialize args, kwds)) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected graphfunction, , _ = self._maybe_define_function(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2667, in _create_graph_function capture_by_value=self._capture_by_value), File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func func_outputs = python_func(*func_args, *func_kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn return weak_wrapped_fn().wrapped(args, **kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper raise e.ag_error_metadata.to_exception(e)

ValueError: in user code:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1147 predict_function  *
    outputs = self.distribute_strategy.run(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run  **
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
    return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1122 predict_step  **
    return self(x, training=False)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:927 __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py:719 call
    convert_kwargs_to_constants=base_layer_utils.call_context().saving)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py:888 _run_internal_graph
    output_tensors = layer(computed_tensors, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:886 __call__
    self.name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/input_spec.py:216 assert_input_compatibility
    ' but received input with shape ' + str(shape))

ValueError: Input 0 of layer peas_dense is incompatible with the layer: expected axis -1 of input shape to have value 30 but received input with shape [None, 29]. 

Do you have any idea what could be the issue? Thank you very much in advance.

ajt986 commented 2 months ago

Are you changing the HOMER reference to mouse (e..g., MM10) and was this reference added to the HOMER installation of the singularity container?

loganminhdang commented 2 months ago

The issue was indeed related (The motif file needs to be updated in the Singularity container). Thanks a lot for your help Asa!