JaneliaSciComp / multifish

EASI-FISH analysis pipeline for spatial transcriptomics
BSD 3-Clause "New" or "Revised" License
32 stars 13 forks source link

Registration glitch in s3 resolution #40

Closed FangmingXie closed 1 year ago

FangmingXie commented 1 year ago

Bug report

Description of the problem

I tried to register 2-round data using def_scale=s3 and aff_scale=s4. The pipeline ran almost through, but errored out at Error executing process > 'registration:final_transform (4).

This error can be reproduced using demo_medium data with def_scale=s3. Upon checking, I believe this is because of a glitch: the default scale s2 is still hard coded somewhere in the process (see error message below), causing inconsistencies.

Does this make sense and could you help me resolve this? Thanks so much!

Log file(s)

May-29 22:24:58.258 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'registration:final_transform (4)'

Caused by:
  Process `registration:final_transform (4)` terminated with an error exit status (1)

Command executed:

  echo "Final transform"
  # Must remove the output directory, or we get a zarr.errors.ContainsArrayError if it already exists
  rm -rf /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R5_medium/registration/LHA3_R5_medium-to-LHA3_R3_medium/warped/c3/s3 || true
  umask 0002
  /app/scripts/waitforpaths.sh /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R3_medium/stitching/export.n5/c3/s3 /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R5_medium/stitching/export.n5/c3/s3
  /entrypoint.sh apply_transform_n5 /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R3_medium/stitching/export.n5 /c3/s3 /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R5_medium/stitching/export.n5 /c3/s3 /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R5_medium/registration/LHA3_R5_medium-to-LHA3_R3_medium/transform /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R5_medium/registration/LHA3_R5_medium-to-LHA3_R3_medium/warped
  echo "Finished final transform for /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R5_medium/registration/LHA3_R5_medium-to-LHA3_R3_medium/warped/c3/s3"

Command exit status:
  1

Command output:
  Final transform
  Checking for /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R3_medium/stitching/export.n5/c3/s3
  Checking for /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R5_medium/stitching/export.n5/c3/s3

Command error:
  INFO:    Could not find any nv files on this host!
  INFO:    Converting SIF file to temporary sandbox...
  Final transform
  Checking for /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R3_medium/stitching/export.n5/c3/s3
  Checking for /u/scratch/f/f7xiesnm/demo_medium_v2/outputs/LHA3_R5_medium/stitching/export.n5/c3/s3
  Traceback (most recent call last):
    File "/app/bigstream/apply_transform_n5.py", line 73, in <module>
      grid       = read_n5_transform(txm_path, '/s2')
    File "/app/bigstream/apply_transform_n5.py", line 42, in read_n5_transform
      grid = txm_n5['/c0'+subpath].shape[::-1]
    File "/opt/conda/envs/myenv/lib/python3.8/site-packages/zarr/hierarchy.py", line 349, in __getitem__
      raise KeyError(item)
  KeyError: '/c0/s2'

Environment

Additional context

No.

cgoina commented 1 year ago

@FangmingXie I fixed this - please let me know if you still run into any issues because of scale inconsistencies.

FangmingXie commented 1 year ago

@cgoina Thanks this is amazing! Just to make sure: I just need to pull from the repo now, redo ./setup.sh, and rerun everything?

cgoina commented 1 year ago

Yes all you need is to pull the latest. I don't even think you need to rerun setup.sh if you simply update the code that you already have checked out. If you do another clone then yes you need setup.sh. Then you should be able to re-run and skip the stitching maybe

FangmingXie commented 1 year ago

@cgoina I got a different issue after pulling the latest version and reran. I feel like this might be related to the Singularity version, as I got similar issues before. I am using Singularity 3.8.5, which worked for the version I pulled from early May, but not now. Do you have any idea why?

I can rerun demo_medium and see if this issue is reproduced because of the new update.

Jun-02 11:13:19.803 [Actor Thread 46] ERROR nextflow.processor.TaskProcessor - Error executing process > 'registration:fixed_coarse_spots (1)'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img.pulling.1685729593636 docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3 > /dev/null
  status : 255
  message:
    INFO:    Converting OCI blobs to SIF format
    FATAL:   While making image from oci registry: error fetching image to cache: while building SIF from layers: unable to create new build: while searching for mksquashfs: exec: "mksquashfs": executable file not found in $PATH
cgoina commented 1 year ago

Maybe the image did not upload correctly. I re-uploaded then try to download it from the ecr registry to test that this command singularity pull --name public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img.pulling.1685729593636 docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3 works and it did. I used 'singularity version 3.8.0-1.el8' and apptainer version 1.1.6-1.el9 to pull the version and both worked. If you have admin privileges check that squashfs-tools is installed on the machine. Singularity needs that. But I got similar error (not for this file but in other cases) when my $HOME/.singularity directory was messed up and my fix was to completely remove the ~/.singularity directory and try again.

FangmingXie commented 1 year ago

Thanks I tried this but it still complains about pulling singularity image. I tried deleting ~/.singularity and reverting to the older commit (the one that always worked), but neither of them work. Do you have any idea?

Jun-02 22:38:18.396 [Actor Thread 28] ERROR nextflow.processor.TaskProcessor - Error executing process > 'registration:cut_tiles (1)'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img.pulling.1685770638007 docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3 > /dev/null
  status : 255
  message:
    FATAL:   While making image from oci registry: error fetching image to cache: failed to get checksum for docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3: pinging container registry public.ecr.aws: Get "https://public.ecr.aws/v2/": dial tcp 99.83.145.10:443: i/o timeout

java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to pull singularity image
  command: singularity pull  --name public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img.pulling.1685770638007 docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3 > /dev/null
  status : 255
  message:
    FATAL:   While making image from oci registry: error fetching image to cache: failed to get checksum for docker://public.ecr.aws/janeliascicomp/multifish/registration:1.2.3: pinging container registry public.ecr.aws: Get "https://public.ecr.aws/v2/": dial tcp 99.83.145.10:443: i/o timeout
Jun-02 23:23:39.094 [Actor Thread 22] ERROR nextflow.processor.TaskProcessor - Error executing process > 'spot_extraction:rsfish:spark_cluster:prepare_spark_work_dir'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name public.ecr.aws-janeliascicomp-multifish-rs_fish-1.0.1.img.pulling.1685773413330 docker://public.ecr.aws/janeliascicomp/multifish/rs_fish:1.0.1 > /dev/null
  status : 255
  message:
    INFO:    Converting OCI blobs to SIF format
    FATAL:   While making image from oci registry: error fetching image to cache: while building SIF from layers: unable to create new build: while searching for mksquashfs: exec: "mksquashfs": executable file not found in $PATH
cgoina commented 1 year ago

@FangmingXie the container looks good actually - I tested the registration locally and the container was retrieved correctly. Check that you have enough disk space on the volume in which /tmp is mounted. I have also seen this problem when I didn't have enough disk space. To fix that you can set TMPDIR environment variable to some location where there is enough space.

cgoina commented 1 year ago

@FangmingXie please do not use the 1.2.3 container yet - I found some other problem - I will update the ticket when it is fixed.

FangmingXie commented 1 year ago

@cgoina Thanks -- I have resolved the container problem from my side. Turned out to be a problem of my local cluster instead of the pipeline.

Meanwhile, I probably got the same issue with registration 1.2.3 as you just described:

Command exit status:
  1

Command output:
  Final transform
  Checking for /u/scratch/f/f7xiesnm/demo_tiny/outputs/LHA3_R3_tiny/stitching/export.n5/c1/s2
  Checking for /u/scratch/f/f7xiesnm/demo_tiny/outputs/LHA3_R5_tiny/stitching/export.n5/c1/s2

Command error:
  INFO:    Could not find any nv files on this host!
  INFO:    Converting SIF file to temporary sandbox...
  Final transform
  Checking for /u/scratch/f/f7xiesnm/demo_tiny/outputs/LHA3_R3_tiny/stitching/export.n5/c1/s2
  Checking for /u/scratch/f/f7xiesnm/demo_tiny/outputs/LHA3_R5_tiny/stitching/export.n5/c1/s2
  Traceback (most recent call last):
    File "/app/bigstream/apply_transform_n5.py", line 73, in <module>
      grid       = read_n5_transform(txm_path, ref_img_subpath)
    File "/app/bigstream/apply_transform_n5.py", line 42, in read_n5_transform
      grid = txm_n5['/c0'+subpath].shape[::-1]
    File "/opt/conda/envs/myenv/lib/python3.8/site-packages/zarr/hierarchy.py", line 349, in __getitem__
      raise KeyError(item)
  KeyError: '/c0/c1/s2'
cgoina commented 1 year ago

This is really fixed now. I have not re-tagged the container on the ec2 so if you already have an image of public.ecr.aws-janeliascicomp-multifish-registration-1.2.3.img in your singularity_cache please remove it first and then try again. If this is a problem for you I will bump up the container version.

FangmingXie commented 1 year ago

@cgoina Thanks this is great -- i will do as you instructed and get back to you if it really works on my side.

FangmingXie commented 1 year ago

@cgoina Thanks it worked for me!