deisseroth-lab / two-photon

Common scripts, libraries, and utilities for 2p experiments
5 stars 6 forks source link

Debug Singularity failure on Sherlock #5

Closed chrisroat closed 4 years ago

chrisroat commented 4 years ago

The container runs fine on the local Ubuntu machine where I built it, but fails when running on Sherlock:

$ singularity run     --bind=$OAK/users/${USER}/test/overview-023:/data     $OAK/pipeline/bruker-rip/containers/bruker-rip.20200903.sif
Setting up wine environment

Executing rip.  It is OK to see 1 err and 4 fixme statements in what follows

2020-09-03 17:01:13.946 rip:50 INFO Ripping from:
 /data/Cycle00001_Filelist.txt
 /data/CYCLE_000001_RAWDATA_000025
2020-09-03 17:01:13.986 rip:96 INFO Waiting for ripper to finish: 3600 seconds remaining
000d:err:menubuilder:init_xdg error looking up the desktop directory
0031:fixme:ntdll:EtwEventRegister ({5eec90ab-c022-44b2-a5dd-fd716a222a15}, 0xd4c1000, 0xd4d2030, 0xd4d2050) stub.
0031:fixme:ntdll:EtwEventSetInformation (deadbeef, 2, 0xd4cfd70, 43) stub
0031:fixme:nls:GetThreadPreferredUILanguages 00000038, 0xdb0cdb4, 0xdb0cdd0 0xdb0cdb0
0031:fixme:nls:get_dummy_preferred_ui_language (0x38 0xdb0cdb4 0xdb0cdd0 0xdb0cdb0) returning a dummy value (current locale)

=================================================================
    Native Crash Reporting
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

=================================================================
    Managed Stacktrace:
=================================================================
      at <unknown> <0xffffffff>
      at Image_Block_Ripping_Utility.frmMain:RipRawImages <0x000e5>
      at Image_Block_Ripping_Utility.frmMain:StartConversion <0x009ca>
      at System.Threading.ThreadHelper:ThreadStart_Context <0x000b2>
      at System.Threading.ExecutionContext:RunInternal <0x001f5>
      at System.Threading.ExecutionContext:Run <0x00052>
      at System.Threading.ExecutionContext:Run <0x0007a>
      at System.Threading.ThreadHelper:ThreadStart <0x0005a>
      at System.Object:runtime_invoke_void__this__ <0x0009f>
=================================================================
wine: Unhandled page fault on read access to 0000000000000050 at address 000000007BC519B5 (thread 0031), starting debugger...
2020-09-03 17:01:23.998 rip:107 INFO   Found filelist files: {PosixPath('/data/Cycle00001_Filelist.txt')}
2020-09-03 17:01:23.998 rip:108 INFO   Found rawdata files: {PosixPath('/data/CYCLE_000001_RAWDATA_000025')}
2020-09-03 17:01:23.998 rip:109 INFO   Found this many tiff files: 0
chrisroat commented 4 years ago

@vsoch Thanks again for all your help. I updated the repo to package and name things a little better. One change within the containers themselves was to re-use the .wine configuration created in the Dockfile's packaging, rather than recreating it. I also made everything batch-only for now. I'll add back the interactive work, if it's asked for.

Everything worked great on my machine. Then I moved to the Singularity container to Sherlock, and got the error above -- I tried both on a dev node and a login node.

Wine makes me sad.

vsoch commented 4 years ago

I think it would make sense that you need to use the wine environment generated by the container at runtime, given the specific environment. Who knows how the Docker environment (created at build time within the container as root) compares to the Singularity container, using the environment from the hpc node and not as root. I don't have insight for why it works locally vs on the cluster, but I would suggest reverting back to generation on the fly (on sherlock in /tmp) and then only if that works look into this different approach.

vsoch commented 4 years ago

And don't be sad :) We will figure things out!

chrisroat commented 4 years ago

Sorry if I wasn't clear -- while the Dockerfile build creates the .wine directory, I am able to successfully run via Singularity on my Ubuntu 18.04 box with Singularity 3.6. The Sherlock machines are CentOS 7.6.1810 and are using Singularity 3.5.3-1.1.el7.

Are there Singularity guarantees regarding portability across OS and Singularity versions?

vsoch commented 4 years ago

I understood what you said! My suggestion is still to try the original strategy with creating the wine prefix directory at runtime. Singularity can’t technically promise anything, but for the most part can guarantee the “same” container - but that doesn’t account for variables that might leak into the environment.

chrisroat commented 4 years ago

Alas, I'm getting the same error. Switching the wineprefix content creation from buildtime to runtime (in /tmp) gave the same results.

I tried interactively with the same result, as well. But interactively, Windows popped up a window with backtrace, which I'm including here.

backtrace.txt

vsoch commented 4 years ago

Okay so it’s good we could rule out those two being different! Let’s try adding more isolation - did you try with —containall and/or —cleanenv?

vsoch commented 4 years ago

Are you installing 64 bit wine prefix?

vsoch commented 4 years ago

I don't see in your updated recipe where you are calling wineprefix --init after having exported the architecture to win64. The backtrack looks like it's using a lot of 32 bit libraries, which is why I ask.

vsoch commented 4 years ago

Can you please try the unedited version that we merged from my PR? The difference is that:

  1. the current recipe seems to use the base image (which is fairly complicated) to create a wine prefix here but we aren't sure how that's even working - it's overly complicated for what you need. It doesn't seem to specify win64.
  2. and then copy to a temporary location here but this doesn't do anything other than move the location, the wineprefix directory could still be corrupt / have the wrong architecture.
  3. The WINEARCH being exported after I don't think will change anything.
  4. And one quick note, some older versions of Singularity won't work well with a comment and empty line before the header section here.

I would step back and take the simplest approach possible. I think the entrypoint of the previous container is overly complicated for what is needed and probably giving you a bug. I would remove that, and try just using wineprefix --init after the variable export, installing what you need, and see if that doesn't error out.

chrisroat commented 4 years ago

I have been trying both with and without wineboot --init/--end-session. I removed it originally because it was not necessary for running on my machine.

I believe there are still some 32-bit pieces of Windows. The wine installation does include the i386 architecture to make a multiarch setup. It's not clear it's an architecture size problem because the container runs fine on my 64-bit machine -- but until we solve this, I won't rule anything out.

I hadn't used the cleanenv or contain related flags before. --cleanenv gives the same results. Using --contain or --containall fails due to all sorts of I/O errors. Too much isolation?!

I also tried updating to wine 5.16 (development version) from wine 5.02, sticking with the on-the-fly winetricks and using the wineboot. Still getting the same fault.

I'm done for this evening!

chrisroat commented 4 years ago

I believe I have tried the simplest possibilities at this point -- the current setup is exactly what you suggest. I have removed the entrypoint based winetricks from the Docker image (as noted in this previous comment , and ran it instead in the runscript. I have also experimented with wineboot --init.

I have also let wine create the directory as my own user on Sherlock, skipping the temp directory altogether.

At this point, I've methodically gone through the combinations I can think of. It's not clear to me what to focus on. The fact that it runs on my 64-bit machine seems to indicate that the container is OK and the wineprefix contents are not likely the issue.

My next approach will be to try building the image on CentOS instead of Ubuntu. I tried earlier with a docker-in-docker approach, but failed. I'll likely bring up a cloud VM.

vsoch commented 4 years ago

Yeah no worries! So my gut is saying that the issue is using the wineprefix generated in the container and then having the wrong architecture, and since you are done for the evening I went went ahead and tested the version that was merged from my branch on Sherlock. I just cloned my branch again, and transferred the singularity image and data to sherlock. I tested on an interactive node, both with X11 (e.g., ssh -XY ... and then srun --x11. One huge detail that we don't account for is the fact that the Sherlock verison of singularity doesn't have support for --env, so the command needs to look like this instead (and this should be added to your docs).

SINGULARITYENV_DISPLAY=:95 SINGULARITYENV_XVFB_RESOLUTION=320x240x8 SINGULARITYENV_XVFB_SCREEN=0 SINGULARITYENV_XVFB_SERVER=:95 singularity run --bind ${PWD}/profiles:/PROFILES --bind ${PWD}/overview-23:/data two-photon.sif

But other than that, It seems to work equivalently to on my host. Is this what you saw?

``` $ SINGULARITYENV_DISPLAY=:95 SINGULARITYENV_XVFB_RESOLUTION=320x240x8 SINGULARITYENV_XVFB_SCREEN=0 SINGULARITYENV_XVFB_SERVER=:95 singularity shell two-photon.sif Singularity> env | grep XVFB XVFB_RESOLUTION=320x240x8 XVFB_SERVER=:95 XVFB_SCREEN=0 Singularity> exit exit [vsochat@sh02-01n42 /scratch/users/vsochat/two-photon]$ SINGULARITYENV_DISPLAY=:95 SINGULARITYENV_XVFB_RESOLUTION=320x240x8 SINGULARITYENV_XVFB_SCREEN=0 SINGULARITYENV_XVFB_SERVER=:95 singularity run --bind ${PWD}/profiles:/PROFILES --bind ${PWD}/overview-23:/data two-photon.sif Creating and changing into temporary directory /tmp/tmp.HmCuMp1dMF... Setting up wine prefix... wine: created the configuration directory '/tmp/tmp.HmCuMp1dMF/wineprefix' 0012:err:ole:marshal_object couldn't get IPSFactory buffer for interface {00000131-0000-0000-c000-000000000046} 0012:err:ole:marshal_object couldn't get IPSFactory buffer for interface {6d5140c1-7436-11ce-8034-00aa006009fa} 0012:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hres=0x80004002 0012:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, 80004002 0012:err:ole:get_local_server_stream Failed: 80004002 0014:err:ole:marshal_object couldn't get IPSFactory buffer for interface {00000131-0000-0000-c000-000000000046} 0014:err:ole:marshal_object couldn't get IPSFactory buffer for interface {6d5140c1-7436-11ce-8034-00aa006009fa} 0014:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hres=0x80004002 0014:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, 80004002 0014:err:ole:get_local_server_stream Failed: 80004002 0017:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 0017:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 0017:err:mscoree:LoadLibraryShim error reading registry key for installroot 0017:err:mscoree:LoadLibraryShim error reading registry key for installroot 0017:err:mscoree:LoadLibraryShim error reading registry key for installroot 0017:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 0019:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 0019:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 5) 0019:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 5) 0017:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 0017:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 001f:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 001f:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 001f:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 001f:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 0010:fixme:dwmapi:DwmIsCompositionEnabled 0000000006C20434 0021:fixme:iphlpapi:NotifyIpInterfaceChange (family 0, callback 0x2b5306d, context 0x5440b0, init_notify 0, handle 0x7a1fa00): stub 0010:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 005a:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 005a:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 005a:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 005a:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 0058:fixme:dwmapi:DwmIsCompositionEnabled 05E1DD14 005c:fixme:iphlpapi:NotifyIpInterfaceChange (family 0, callback 0x259f537, context 0x2a3890, init_notify 0, handle 0x6c2fce8): stub 0058:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION X connection to localhost:71.0 broken (explicit kill or server shutdown). srun: error: _half_duplex: wrote -1 of 32 wine: configuration in L"/tmp/tmp.HmCuMp1dMF/wineprefix" has been updated. Installing C++ libraries... Executing mkdir -p /tmp/tmp.HmCuMp1dMF ------------------------------------------------------ You are using a 64-bit WINEPREFIX. Note that many verbs only install 32-bit versions of packages. If you encounter problems, please retest in a clean 32-bit WINEPREFIX before reporting a bug. ------------------------------------------------------ Using winetricks 20200412-next - sha256sum: 57c09343a9a09359b7f7556113f36670037a3d860848113283a36f34b9388562 with wine-5.0.2 and WINEARCH=win64 Executing w_do_call vcrun2015 Executing mkdir -p /tmp/tmp.HmCuMp1dMF ------------------------------------------------------ You are using a 64-bit WINEPREFIX. Note that many verbs only install 32-bit versions of packages. If you encounter problems, please retest in a clean 32-bit WINEPREFIX before reporting a bug. ------------------------------------------------------ Executing load_vcrun2015 Executing mkdir -p /home/users/vsochat/.cache/winetricks/vcrun2015 Executing cd /home/users/vsochat/.cache/winetricks/vcrun2015 Downloading https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x86.exe to /home/users/vsochat/.cache/winetricks/vcrun2015 --2020-09-03 21:38:35-- https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x86.exe Resolving download.microsoft.com (download.microsoft.com)... 104.84.227.57, 2600:1406:3c:49b::e59, 2600:1406:3c:483::e59 Connecting to download.microsoft.com (download.microsoft.com)|104.84.227.57|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 13767776 (13M) [application/octet-stream] Saving to: 'vc_redist.x86.exe' vc_redist.x86.exe 100%[=======================================================>] 13.13M --.-KB/s in 0.1s 2020-09-03 21:38:36 (107 MB/s) - 'vc_redist.x86.exe' saved [13767776/13767776] Executing cd /tmp/tmp.HmCuMp1dMF ------------------------------------------------------ Working around wine bug 37781 ------------------------------------------------------ ------------------------------------------------------ This may fail in non-XP mode, see https://bugs.winehq.org/show_bug.cgi?id=37781 ------------------------------------------------------ Using native,builtin override for following DLLs: api-ms-win-crt-private-l1-1-0 api-ms-win-crt-conio-l1-1-0 api-ms-win-crt-heap-l1-1-0 api-ms-win-crt-locale-l1-1-0 api-ms-win-crt-math-l1-1-0 api-ms-win-crt-runtime-l1-1-0 api-ms-win-crt-stdio-l1-1-0 api-ms-win-crt-time-l1-1-0 atl140 concrt140 msvcp140 msvcr140 ucrtbase vcomp140 vcruntime140 Executing wine regedit /S C:\windows\Temp\override-dll.reg Executing wine64 regedit /S C:\windows\Temp\override-dll.reg The operation completed successfully Setting Windows version to winxp Executing wine regedit /S C:\windows\Temp\set-winver.reg Executing wine64 regedit /S C:\windows\Temp\set-winver.reg ------------------------------------------------------ Running /usr/bin/wineserver -w. This will hang until all wine processes in prefix=/tmp/tmp.HmCuMp1dMF/wineprefix terminate ------------------------------------------------------ 00dc:fixme:rpc:handle_bind_error unexpected status value 1765 0094:err:rpc:RpcAssoc_BindConnection rejected bind for reason 0 0091:fixme:service:scmdatabase_autostart_services Auto-start service L"PlugPlay" failed to start: 1053 0091:fixme:service:scmdatabase_autostart_services Auto-start service L"winebus" failed to start: 1115 0091:fixme:service:scmdatabase_autostart_services Auto-start service L"MountMgr" failed to start: 1115 Executing cd /home/users/vsochat/.cache/winetricks/vcrun2015 Executing wine vc_redist.x86.exe /q 0009:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 0009:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 0009:fixme:ntdll:NtQueryInformationToken QueryInformationToken( ..., TokenElevation, ...) semi-stub 002e:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 002e:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 002e:fixme:ntdll:NtQueryInformationToken QueryInformationToken( ..., TokenElevation, ...) semi-stub 002e:fixme:advapi:DecryptFileW (L"C:\\users\\vsochat\\Temp\\{74d0e5db-b326-4dae-a6b2-445b9de1836e}\\", 00000000): stub 0009:fixme:ole:CoInitializeSecurity (0032F5F4,-1,00000000,00000000,6,2,00000000,12288,00000000) - stub! 0032:fixme:shell:SHAutoComplete stub 002e:fixme:advapi:DecryptFileW (L"C:\\users\\vsochat\\Temp\\{74d0e5db-b326-4dae-a6b2-445b9de1836e}\\", 00000000): stub 0009:fixme:wuapi:automatic_updates_Pause 0009:fixme:sfc:SRSetRestorePointW 0032F4C8 0032F6D8 0033:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 0033:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.30" 0033:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.28" 0033:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.30" 0033:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.28" 0009:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 0031:fixme:event:wait_for_withdrawn_state window 0x20038/800001 wait timed out 0030:fixme:event:wait_for_withdrawn_state window 0x2003a/e00001 wait timed out 0009:fixme:wuapi:automatic_updates_Resume Executing cd /home/users/vsochat/.cache/winetricks/vcrun2015 Downloading https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x64.exe to /home/users/vsochat/.cache/winetricks/vcrun2015 --2020-09-03 21:38:48-- https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x64.exe Resolving download.microsoft.com (download.microsoft.com)... 104.68.126.243, 2600:1406:3c:49b::e59, 2600:1406:3c:483::e59 Connecting to download.microsoft.com (download.microsoft.com)|104.68.126.243|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 14572000 (14M) [application/octet-stream] Saving to: 'vc_redist.x64.exe' vc_redist.x64.exe 100%[=======================================================>] 13.90M 11.2MB/s in 1.2s 2020-09-03 21:38:50 (11.2 MB/s) - 'vc_redist.x64.exe' saved [14572000/14572000] Executing cd /home/users/vsochat/.cache/winetricks/vcrun2015 Current Wine does not have Wine bug 30713, so not applying workaround Executing wine vc_redist.x64.exe /q 0037:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 0037:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 0037:fixme:ntdll:NtQueryInformationToken QueryInformationToken( ..., TokenElevation, ...) semi-stub 003f:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 003f:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 003f:fixme:ntdll:NtQueryInformationToken QueryInformationToken( ..., TokenElevation, ...) semi-stub 003f:fixme:advapi:DecryptFileW (L"C:\\users\\vsochat\\Temp\\{e46eca4f-393b-40df-9f49-076faf788d83}\\", 00000000): stub 0037:fixme:ole:CoInitializeSecurity (0032F5F4,-1,00000000,00000000,6,2,00000000,12288,00000000) - stub! 0044:fixme:shell:SHAutoComplete stub 003f:fixme:advapi:DecryptFileW (L"C:\\users\\vsochat\\Temp\\{e46eca4f-393b-40df-9f49-076faf788d83}\\", 00000000): stub 0037:fixme:wuapi:automatic_updates_Pause 0037:fixme:sfc:SRSetRestorePointW 0032F4C8 0032F6D8 0043:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 0043:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.30" 0043:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.28" 0043:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.30" 0043:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.28" 0037:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 0042:fixme:event:wait_for_withdrawn_state window 0x20062/c00001 wait timed out 0041:fixme:event:wait_for_withdrawn_state window 0x20058/e00001 wait timed out 0037:fixme:wuapi:automatic_updates_Resume Containerizing apps directory... X connection to :99 broken (explicit kill or server shutdown). Link created Containerizing user profile... This user profile is newly generated... Detected /data folder bound! /usr/bin/python3 /app/rip.py --directory /data 2020-09-03 21:38:57.584 rip:50 INFO Ripping from: /data/Cycle00001_Filelist.txt /data/CYCLE_000001_RAWDATA_000025 2020-09-03 21:38:57.601 rip:96 INFO Waiting for ripper to finish: 3600 seconds remaining 0055:fixme:ntdll:EtwEventRegister ({5eec90ab-c022-44b2-a5dd-fd716a222a15}, 0x6ec1000, 0x6ed2030, 0x6ed2050) stub. 0055:fixme:ntdll:EtwEventSetInformation (deadbeef, 2, 0x6ecfd70, 43) stub 0055:fixme:nls:GetThreadPreferredUILanguages 00000038, 0x750cdb4, 0x750cdd0 0x750cdb0 0055:fixme:nls:get_dummy_preferred_ui_language (0x38 0x750cdb4 0x750cdd0 0x750cdb0) returning a dummy value (current locale) 2020-09-03 21:39:07.606 rip:107 INFO Found filelist files: None 2020-09-03 21:39:07.607 rip:108 INFO Found rawdata files: None 2020-09-03 21:39:07.607 rip:109 INFO Found this many tiff files: 1 2020-09-03 21:39:07.607 rip:96 INFO Waiting for ripper to finish: 3590 seconds remaining 2020-09-03 21:39:17.617 rip:107 INFO Found filelist files: None 2020-09-03 21:39:17.618 rip:108 INFO Found rawdata files: None 2020-09-03 21:39:17.618 rip:109 INFO Found this many tiff files: 1 2020-09-03 21:39:17.618 rip:112 INFO Detected ripping is complete 2020-09-03 21:39:27.626 rip:114 INFO Killing ripper 2020-09-03 21:39:27.626 rip:116 INFO Ripper has been killed 2020-09-03 21:39:28.628 rip:88 INFO cleaned up! ```

And the tiff is generated too.

$ ls overview-23/
overview-023_Cycle00001_Ch3_000001.ome.tif  overview-023.env  overview-023.xml  References

So this is really good news, because it means we do have a working solution! You must have accidentally added some tiny change that broke the build, and it wasn't apparent on your host. If you feel strongly about keeping your changes, you should probably start from the version I made, and then make one change at a time, test as you go, and you'll know when it breaks (and perhaps have better information to work to fix it). I'd be really interested to know what the bug turned out to be! You can also just use the verison above, which appears to work as expected. Anyhoo, hopefully this will be some good news to wake up to. I should be off to bed soon too, night!

chrisroat commented 4 years ago

I was intrigued by your update, so hopped back on. I build at 4adbb20, and still have the same result on Sherlock.

Can you verify what OS you are using and what version of Singularity you built with?

vsoch commented 4 years ago

Ubuntu 18.04 with 3.6.0!

vsoch commented 4 years ago

I'll upload the container for you if you want to reproduce what I did. I would suggest having an automated build of the Docker container to a registry (using CI, github workflows would work well for this) and then pulling down to Sherlock (using it's singularity).

chrisroat commented 4 years ago

Thanks. Can you just transfer the sif to an OAK location and make it world readable?

vsoch commented 4 years ago

I don’t have the same superpowers as my colleagues (eg writing to your oak space) but I can send you a Google Drive link to download and transfer... low tech but gets the job done! It’s uploading now.

chrisroat commented 4 years ago

Not my space... just yours (and chmod it)... if you have any OAK space or access to SCRATCH.

vsoch commented 4 years ago

I can try that - I've never used OAK before. Can you tell me the command for chmod to get the correct permission?

chrisroat commented 4 years ago

Oh -- no worries then. But it would chmod a+r <filename>

chrisroat commented 4 years ago

Ah, that didn't work. I sometimes get confused how locked down things are. Maybe that drive link after all?

chrisroat commented 4 years ago

(I guess if you made your home directory world readable.... but let's not go there)

vsoch commented 4 years ago

Yeah, I always run into these issues with permissions, I think you have to make the directory world readable, which isn't something I want to do. I gave you access to the drive file, hopefully that will do the trick!

chrisroat commented 4 years ago

It still does not work for me. Hmph. I used your command and your sif file. Perhaps its something in our setups?

(BTW, the XVFB env vars aren't to be used for Singularity, so the only one that is needed is the one you rely on in runscript. They are only used by the Docker entrypoint script.)

I removed my dotfiles (i.e. .bashrc and .profile, to give me a "standard" Sherlock environment) and my local .ssh/config (to be sure the connection wasn't the issue, due to all the display wrangling). If it's not too much to ask, can you do the same and try again? And can you run on a login node and tell me which one you use?

I will recreate the same env as you, and if it still fails, I'll file something with SRCC.

vsoch commented 4 years ago

I'll give you again complete instructions to reproduce what I did above, and you can run it by others / use for a bug report.

Container

The container is built first with Docker from the branch here. It's then pulled down to Singularity with the docker daemon. Reproducing this isn't as important because the container is provided for testing on the cluster, if someone else needs access please have them reach out to me (and you can also share the one you are using).

Connection

I make sure to connect to the cluster with ssh.

ssh -XY <login>@<userame>.edu

And then request an interactive node with x11 too

srun --x11 --pty bash 

I don't think this is actually necessary, but I had first cloned the branch.

Getting Code and Files

$ git clone -b croat-singularity https://github.com/researchapps/two-photon
cd two-photon

I also scp'd the container and the data (to unzip) to the node.

$ scp two-photon.sif <username>@login.sherlock.stanford.edu:/scratch/users/<username>/two-photon/two-photon.sif
$ scp brucker.zip <username>@login.sherlock.stanford.edu:/scratch/users/<username>/two-photon/brucker.zip

Back in the folder, I unzip the brucker and rename to overview-23 to match the tutorial

Running the container

I chose to still provide XVFB since we are in a headless environment - that said I did see the wine GUI work fine since I had x11. I did not try it without that.

SINGULARITYENV_DISPLAY=:95 SINGULARITYENV_XVFB_RESOLUTION=320x240x8 SINGULARITYENV_XVFB_SCREEN=0 SINGULARITYENV_XVFB_SERVER=:95 singularity run --bind ${PWD}/profiles:/PROFILES --bind ${PWD}/overview-23:/data two-photon.sif

It worked as it should - I am not able to reproduce the error you are seeing, but I suspect there is some difference in our environments. I would also check that you don't have anything extra on your python path, and that you have exported the variable to unset the user site (in $HOME/.bin usually)

``` $ SINGULARITYENV_DISPLAY=:95 SINGULARITYENV_XVFB_RESOLUTION=320x240x8 SINGULARITYENV_XVFB_SCREEN=0 SINGULARITYENV_XVFB_SERVER=:95 singularity shell two-photon.sif Singularity> env | grep XVFB XVFB_RESOLUTION=320x240x8 XVFB_SERVER=:95 XVFB_SCREEN=0 Singularity> exit exit [vsochat@sh02-01n42 /scratch/users/vsochat/two-photon]$ SINGULARITYENV_DISPLAY=:95 SINGULARITYENV_XVFB_RESOLUTION=320x240x8 SINGULARITYENV_XVFB_SCREEN=0 SINGULARITYENV_XVFB_SERVER=:95 singularity run --bind ${PWD}/profiles:/PROFILES --bind ${PWD}/overview-23:/data two-photon.sif Creating and changing into temporary directory /tmp/tmp.HmCuMp1dMF... Setting up wine prefix... wine: created the configuration directory '/tmp/tmp.HmCuMp1dMF/wineprefix' 0012:err:ole:marshal_object couldn't get IPSFactory buffer for interface {00000131-0000-0000-c000-000000000046} 0012:err:ole:marshal_object couldn't get IPSFactory buffer for interface {6d5140c1-7436-11ce-8034-00aa006009fa} 0012:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hres=0x80004002 0012:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, 80004002 0012:err:ole:get_local_server_stream Failed: 80004002 0014:err:ole:marshal_object couldn't get IPSFactory buffer for interface {00000131-0000-0000-c000-000000000046} 0014:err:ole:marshal_object couldn't get IPSFactory buffer for interface {6d5140c1-7436-11ce-8034-00aa006009fa} 0014:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hres=0x80004002 0014:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, 80004002 0014:err:ole:get_local_server_stream Failed: 80004002 0017:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 0017:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 0017:err:mscoree:LoadLibraryShim error reading registry key for installroot 0017:err:mscoree:LoadLibraryShim error reading registry key for installroot 0017:err:mscoree:LoadLibraryShim error reading registry key for installroot 0017:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 0019:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 0019:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:err:mscoree:LoadLibraryShim error reading registry key for installroot 0019:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 5) 0019:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 5) 0017:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 0017:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 001f:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 001f:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 001f:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 001f:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 0010:fixme:dwmapi:DwmIsCompositionEnabled 0000000006C20434 0021:fixme:iphlpapi:NotifyIpInterfaceChange (family 0, callback 0x2b5306d, context 0x5440b0, init_notify 0, handle 0x7a1fa00): stub 0010:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 005a:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 005a:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 005a:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 005a:fixme:msi:internal_ui_handler internal UI not implemented for message 0x0b000000 (UI level = 1) 0058:fixme:dwmapi:DwmIsCompositionEnabled 05E1DD14 005c:fixme:iphlpapi:NotifyIpInterfaceChange (family 0, callback 0x259f537, context 0x2a3890, init_notify 0, handle 0x6c2fce8): stub 0058:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION X connection to localhost:71.0 broken (explicit kill or server shutdown). srun: error: _half_duplex: wrote -1 of 32 wine: configuration in L"/tmp/tmp.HmCuMp1dMF/wineprefix" has been updated. Installing C++ libraries... Executing mkdir -p /tmp/tmp.HmCuMp1dMF ------------------------------------------------------ You are using a 64-bit WINEPREFIX. Note that many verbs only install 32-bit versions of packages. If you encounter problems, please retest in a clean 32-bit WINEPREFIX before reporting a bug. ------------------------------------------------------ Using winetricks 20200412-next - sha256sum: 57c09343a9a09359b7f7556113f36670037a3d860848113283a36f34b9388562 with wine-5.0.2 and WINEARCH=win64 Executing w_do_call vcrun2015 Executing mkdir -p /tmp/tmp.HmCuMp1dMF ------------------------------------------------------ You are using a 64-bit WINEPREFIX. Note that many verbs only install 32-bit versions of packages. If you encounter problems, please retest in a clean 32-bit WINEPREFIX before reporting a bug. ------------------------------------------------------ Executing load_vcrun2015 Executing mkdir -p /home/users/vsochat/.cache/winetricks/vcrun2015 Executing cd /home/users/vsochat/.cache/winetricks/vcrun2015 Downloading https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x86.exe to /home/users/vsochat/.cache/winetricks/vcrun2015 --2020-09-03 21:38:35-- https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x86.exe Resolving download.microsoft.com (download.microsoft.com)... 104.84.227.57, 2600:1406:3c:49b::e59, 2600:1406:3c:483::e59 Connecting to download.microsoft.com (download.microsoft.com)|104.84.227.57|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 13767776 (13M) [application/octet-stream] Saving to: 'vc_redist.x86.exe' vc_redist.x86.exe 100%[=======================================================>] 13.13M --.-KB/s in 0.1s 2020-09-03 21:38:36 (107 MB/s) - 'vc_redist.x86.exe' saved [13767776/13767776] Executing cd /tmp/tmp.HmCuMp1dMF ------------------------------------------------------ Working around wine bug 37781 ------------------------------------------------------ ------------------------------------------------------ This may fail in non-XP mode, see https://bugs.winehq.org/show_bug.cgi?id=37781 ------------------------------------------------------ Using native,builtin override for following DLLs: api-ms-win-crt-private-l1-1-0 api-ms-win-crt-conio-l1-1-0 api-ms-win-crt-heap-l1-1-0 api-ms-win-crt-locale-l1-1-0 api-ms-win-crt-math-l1-1-0 api-ms-win-crt-runtime-l1-1-0 api-ms-win-crt-stdio-l1-1-0 api-ms-win-crt-time-l1-1-0 atl140 concrt140 msvcp140 msvcr140 ucrtbase vcomp140 vcruntime140 Executing wine regedit /S C:\windows\Temp\override-dll.reg Executing wine64 regedit /S C:\windows\Temp\override-dll.reg The operation completed successfully Setting Windows version to winxp Executing wine regedit /S C:\windows\Temp\set-winver.reg Executing wine64 regedit /S C:\windows\Temp\set-winver.reg ------------------------------------------------------ Running /usr/bin/wineserver -w. This will hang until all wine processes in prefix=/tmp/tmp.HmCuMp1dMF/wineprefix terminate ------------------------------------------------------ 00dc:fixme:rpc:handle_bind_error unexpected status value 1765 0094:err:rpc:RpcAssoc_BindConnection rejected bind for reason 0 0091:fixme:service:scmdatabase_autostart_services Auto-start service L"PlugPlay" failed to start: 1053 0091:fixme:service:scmdatabase_autostart_services Auto-start service L"winebus" failed to start: 1115 0091:fixme:service:scmdatabase_autostart_services Auto-start service L"MountMgr" failed to start: 1115 Executing cd /home/users/vsochat/.cache/winetricks/vcrun2015 Executing wine vc_redist.x86.exe /q 0009:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 0009:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 0009:fixme:ntdll:NtQueryInformationToken QueryInformationToken( ..., TokenElevation, ...) semi-stub 002e:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 002e:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 002e:fixme:ntdll:NtQueryInformationToken QueryInformationToken( ..., TokenElevation, ...) semi-stub 002e:fixme:advapi:DecryptFileW (L"C:\\users\\vsochat\\Temp\\{74d0e5db-b326-4dae-a6b2-445b9de1836e}\\", 00000000): stub 0009:fixme:ole:CoInitializeSecurity (0032F5F4,-1,00000000,00000000,6,2,00000000,12288,00000000) - stub! 0032:fixme:shell:SHAutoComplete stub 002e:fixme:advapi:DecryptFileW (L"C:\\users\\vsochat\\Temp\\{74d0e5db-b326-4dae-a6b2-445b9de1836e}\\", 00000000): stub 0009:fixme:wuapi:automatic_updates_Pause 0009:fixme:sfc:SRSetRestorePointW 0032F4C8 0032F6D8 0033:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 0033:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.30" 0033:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.28" 0033:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.30" 0033:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.28" 0009:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 0031:fixme:event:wait_for_withdrawn_state window 0x20038/800001 wait timed out 0030:fixme:event:wait_for_withdrawn_state window 0x2003a/e00001 wait timed out 0009:fixme:wuapi:automatic_updates_Resume Executing cd /home/users/vsochat/.cache/winetricks/vcrun2015 Downloading https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x64.exe to /home/users/vsochat/.cache/winetricks/vcrun2015 --2020-09-03 21:38:48-- https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x64.exe Resolving download.microsoft.com (download.microsoft.com)... 104.68.126.243, 2600:1406:3c:49b::e59, 2600:1406:3c:483::e59 Connecting to download.microsoft.com (download.microsoft.com)|104.68.126.243|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 14572000 (14M) [application/octet-stream] Saving to: 'vc_redist.x64.exe' vc_redist.x64.exe 100%[=======================================================>] 13.90M 11.2MB/s in 1.2s 2020-09-03 21:38:50 (11.2 MB/s) - 'vc_redist.x64.exe' saved [14572000/14572000] Executing cd /home/users/vsochat/.cache/winetricks/vcrun2015 Current Wine does not have Wine bug 30713, so not applying workaround Executing wine vc_redist.x64.exe /q 0037:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 0037:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 0037:fixme:ntdll:NtQueryInformationToken QueryInformationToken( ..., TokenElevation, ...) semi-stub 003f:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 003f:fixme:heap:RtlSetHeapInformation (nil) 1 (nil) 0 stub 003f:fixme:ntdll:NtQueryInformationToken QueryInformationToken( ..., TokenElevation, ...) semi-stub 003f:fixme:advapi:DecryptFileW (L"C:\\users\\vsochat\\Temp\\{e46eca4f-393b-40df-9f49-076faf788d83}\\", 00000000): stub 0037:fixme:ole:CoInitializeSecurity (0032F5F4,-1,00000000,00000000,6,2,00000000,12288,00000000) - stub! 0044:fixme:shell:SHAutoComplete stub 003f:fixme:advapi:DecryptFileW (L"C:\\users\\vsochat\\Temp\\{e46eca4f-393b-40df-9f49-076faf788d83}\\", 00000000): stub 0037:fixme:wuapi:automatic_updates_Pause 0037:fixme:sfc:SRSetRestorePointW 0032F4C8 0032F6D8 0043:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet 0043:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.30" 0043:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.28" 0043:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.30" 0043:fixme:wintrust:SOFTPUB_VerifyImageHash Cannot verify hash for pszObjId="1.3.6.1.4.1.311.2.1.28" 0037:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION 0042:fixme:event:wait_for_withdrawn_state window 0x20062/c00001 wait timed out 0041:fixme:event:wait_for_withdrawn_state window 0x20058/e00001 wait timed out 0037:fixme:wuapi:automatic_updates_Resume Containerizing apps directory... X connection to :99 broken (explicit kill or server shutdown). Link created Containerizing user profile... This user profile is newly generated... Detected /data folder bound! /usr/bin/python3 /app/rip.py --directory /data 2020-09-03 21:38:57.584 rip:50 INFO Ripping from: /data/Cycle00001_Filelist.txt /data/CYCLE_000001_RAWDATA_000025 2020-09-03 21:38:57.601 rip:96 INFO Waiting for ripper to finish: 3600 seconds remaining 0055:fixme:ntdll:EtwEventRegister ({5eec90ab-c022-44b2-a5dd-fd716a222a15}, 0x6ec1000, 0x6ed2030, 0x6ed2050) stub. 0055:fixme:ntdll:EtwEventSetInformation (deadbeef, 2, 0x6ecfd70, 43) stub 0055:fixme:nls:GetThreadPreferredUILanguages 00000038, 0x750cdb4, 0x750cdd0 0x750cdb0 0055:fixme:nls:get_dummy_preferred_ui_language (0x38 0x750cdb4 0x750cdd0 0x750cdb0) returning a dummy value (current locale) 2020-09-03 21:39:07.606 rip:107 INFO Found filelist files: None 2020-09-03 21:39:07.607 rip:108 INFO Found rawdata files: None 2020-09-03 21:39:07.607 rip:109 INFO Found this many tiff files: 1 2020-09-03 21:39:07.607 rip:96 INFO Waiting for ripper to finish: 3590 seconds remaining 2020-09-03 21:39:17.617 rip:107 INFO Found filelist files: None 2020-09-03 21:39:17.618 rip:108 INFO Found rawdata files: None 2020-09-03 21:39:17.618 rip:109 INFO Found this many tiff files: 1 2020-09-03 21:39:17.618 rip:112 INFO Detected ripping is complete 2020-09-03 21:39:27.626 rip:114 INFO Killing ripper 2020-09-03 21:39:27.626 rip:116 INFO Ripper has been killed 2020-09-03 21:39:28.628 rip:88 INFO cleaned up! ```

I can confirm that the tif file is generated.

$ ls overview-23/
overview-023_Cycle00001_Ch3_000001.ome.tif  overview-023.env  overview-023.xml  References

Good luck!

chrisroat commented 4 years ago

As previous explained, I am using the stock Sherlock environment -- I have removed all my personal dotfile configs.

What's worse is that now when I run your command, I get errors about xvfb problems and about it creating files on my local system.

This user profile is newly generated...
Detected /data folder bound!
/usr/bin/python3 /app/rip.py --directory /data
xvfb-run: error: Xvfb failed to start
00b5:err:menubuilder:convert_to_native_icon error 0x80070003 creating output file L"Z:\\home\\users\\croat\\.local\\share\\icons\\hicolor\\32x32\\apps\\1E64_notepad.0.png"
00b5:err:menubuilder:convert_to_native_icon error 0x80070003 creating output file L"Z:\\home\\users\\croat\\.local\\share\\icons\\hicolor\\16x16\\apps\\1E64_notepad.0.png"

It is bothersome to me that Singularity isn't hermetic. I am fighting environment issues and some sort of problem with local files. It feels like the state of the system keeps changing under my feet. And to be fair, I also partly blame the fact that we're forced to use Windows via wine.

For the moment, I'll likely put this on hold. I'd really like to use Sherlock/OAK for this, but if I cannot get things reproducible (even in the failure mode), then I'd likely not be able to support a dozen labmates using it with their various configurations.

Maybe if I come back in the future,

vsoch commented 4 years ago

Sorry @chrisroat, I know how frustrating these things are, and I've still been unable to reproduce your issues so I feel helpless to help. It is true that it's not perfect, and adding Windows/wine to the mix makes it much harder. Perhaps you could take some time away, and come back with fresh eyes? And in the meantime, see if you have colleagues that can try to reproduce the working one?

The error hints that you still have some setting, somewhere, pointing to your user site (the .local folder in home) so I suspect that might be an issue. if you have wine installed somewhere on the cluster, and any envars for it, I'd clear those up. You could also look critically at what this windows app is doing, and recreate it in a simple binary (that's probably the best solution moving forward.

chrisroat commented 4 years ago

I'm about to lose it.

Over lunch I got this hunch it was the filesystem. My PWD was on OAK, which is Lustre. I think you were using your home directory, which is NFS. So I ran on my home directory, and it worked! I thought I was on to something, but it was a red herring....

I started trying SCRATCH and also running a script to copy the data before execution. SCRATCH (which is also Lustre, but a different configuration) seemed to work. But then I accidentally ran on OAK again.... and it worked. So I was confused.

Then I realized it -- the sample data I had set aside was marked readonly. Depending on where/how I copied, it would remain so or not.

So in the end, that big error dump with scary "Native Crash Reporting" and no mention (that I see) of I/O is really about the ripper needing write privileges.

I do want people to mark their raw data as readonly, so we do make a copy prior to ripping. We will need to take care to set permissions appropriately.

Sigh.

vsoch commented 4 years ago

What a puzzle! And such persistence! I'm so happy that you figured it out! I will make sure to note next time where I'm running things - I was using data/container/code in my scratch space, and using the sample data that you provided to me from Google Drive. I incorrectly made the assumption that all datasets were equivalent. Anyway, woohoo! And happy Friday!

tbenst commented 4 years ago

Woah, great troubleshooting @chrisroat and @vsoch!! Great job working through all of these super subtle issues!