Closed FangmingXie closed 1 year ago
@FangmingXie I fixed this - please let me know if you still run into any issues because of scale inconsistencies.
@cgoina Thanks this is amazing! Just to make sure: I just need to pull from the repo now, redo ./setup.sh
, and rerun everything?
Yes all you need is to pull the latest. I don't even think you need to rerun setup.sh if you simply update the code that you already have checked out. If you do another clone then yes you need setup.sh. Then you should be able to re-run and skip the stitching maybe
@cgoina I got a different issue after pulling the latest version and reran. I feel like this might be related to the Singularity version, as I got similar issues before. I am using Singularity 3.8.5
, which worked for the version I pulled from early May, but not now. Do you have any idea why?
I can rerun demo_medium and see if this issue is reproduced because of the new update.
Jun-02 11:13:19.803 [Actor Thread 46] ERROR nextflow.processor.TaskProcessor - Error executing process > 'registration:fixed_coarse_spots (1)'
Caused by:
Failed to pull singularity image
command: singularity pull --name public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img.pulling.1685729593636 docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3 > /dev/null
status : 255
message:
INFO: Converting OCI blobs to SIF format
FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: unable to create new build: while searching for mksquashfs: exec: "mksquashfs": executable file not found in $PATH
Maybe the image did not upload correctly. I re-uploaded then try to download it from the ecr registry to test that this command singularity pull --name public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img.pulling.1685729593636 docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3
works and it did. I used 'singularity version 3.8.0-1.el8' and apptainer version 1.1.6-1.el9
to pull the version and both worked. If you have admin privileges check that squashfs-tools is installed on the machine. Singularity needs that. But I got similar error (not for this file but in other cases) when my $HOME/.singularity directory was messed up and my fix was to completely remove the ~/.singularity
directory and try again.
Thanks I tried this but it still complains about pulling singularity image. I tried deleting ~/.singularity
and reverting to the older commit (the one that always worked), but neither of them work. Do you have any idea?
Jun-02 22:38:18.396 [Actor Thread 28] ERROR nextflow.processor.TaskProcessor - Error executing process > 'registration:cut_tiles (1)'
Caused by:
Failed to pull singularity image
command: singularity pull --name public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img.pulling.1685770638007 docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3 > /dev/null
status : 255
message:
FATAL: While making image from oci registry: error fetching image to cache: failed to get checksum for docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3: pinging container registry public.ecr.aws: Get "https://public.ecr.aws/v2/": dial tcp 99.83.145.10:443: i/o timeout
java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to pull singularity image
command: singularity pull --name public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img.pulling.1685770638007 docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3 > /dev/null
status : 255
message:
FATAL: While making image from oci registry: error fetching image to cache: failed to get checksum for docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3: pinging container registry public.ecr.aws: Get "https://public.ecr.aws/v2/": dial tcp 99.83.145.10:443: i/o timeout
Jun-02 23:23:39.094 [Actor Thread 22] ERROR nextflow.processor.TaskProcessor - Error executing process > 'spot_extraction:rsfish:spark_cluster:prepare_spark_work_dir'
Caused by:
Failed to pull singularity image
command: singularity pull --name public.ecr.aws-janeliascicomp-multifish-rs_fish-1.0.1.img.pulling.1685773413330 docker://public.ecr.aws/janeliascicomp/multifish/rs_fish:1.0.1 > /dev/null
status : 255
message:
INFO: Converting OCI blobs to SIF format
FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: unable to create new build: while searching for mksquashfs: exec: "mksquashfs": executable file not found in $PATH
@FangmingXie the container looks good actually - I tested the registration locally and the container was retrieved correctly. Check that you have enough disk space on the volume in which /tmp is mounted. I have also seen this problem when I didn't have enough disk space. To fix that you can set TMPDIR environment variable to some location where there is enough space.
@FangmingXie please do not use the 1.2.3 container yet - I found some other problem - I will update the ticket when it is fixed.
@cgoina Thanks -- I have resolved the container problem from my side. Turned out to be a problem of my local cluster instead of the pipeline.
Meanwhile, I probably got the same issue with registration 1.2.3 as you just described:
Command exit status:
1
Command output:
Final transform
Checking for /u/scratch/f/f7xiesnm/demo_tiny/outputs/LHA3_R3_tiny/stitching/export.n5/c1/s2
Checking for /u/scratch/f/f7xiesnm/demo_tiny/outputs/LHA3_R5_tiny/stitching/export.n5/c1/s2
Command error:
INFO: Could not find any nv files on this host!
INFO: Converting SIF file to temporary sandbox...
Final transform
Checking for /u/scratch/f/f7xiesnm/demo_tiny/outputs/LHA3_R3_tiny/stitching/export.n5/c1/s2
Checking for /u/scratch/f/f7xiesnm/demo_tiny/outputs/LHA3_R5_tiny/stitching/export.n5/c1/s2
Traceback (most recent call last):
File "/app/bigstream/apply_transform_n5.py", line 73, in <module>
grid = read_n5_transform(txm_path, ref_img_subpath)
File "/app/bigstream/apply_transform_n5.py", line 42, in read_n5_transform
grid = txm_n5['/c0'+subpath].shape[::-1]
File "/opt/conda/envs/myenv/lib/python3.8/site-packages/zarr/hierarchy.py", line 349, in __getitem__
raise KeyError(item)
KeyError: '/c0/c1/s2'
This is really fixed now. I have not re-tagged the container on the ec2 so if you already have an image of public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img
in your singularity_cache please remove it first and then try again. If this is a problem for you I will bump up the container version.
@cgoina Thanks this is great -- i will do as you instructed and get back to you if it really works on my side.
@cgoina Thanks it worked for me!
Bug report
Description of the problem
I tried to register 2-round data using
def_scale=s3
andaff_scale=s4
. The pipeline ran almost through, but errored out atError executing process > 'registration:final_transform (4)
.This error can be reproduced using
demo_medium
data withdef_scale=s3
. Upon checking, I believe this is because of a glitch: the default scales2
is still hard coded somewhere in the process (see error message below), causing inconsistencies.Does this make sense and could you help me resolve this? Thanks so much!
Log file(s)
Environment
Additional context
No.