deisseroth-lab / two-photon

Common scripts, libraries, and utilities for 2p experiments
5 stars 5 forks source link

File Ripping Particulars #12

Open jmdelahanty opened 3 years ago

jmdelahanty commented 3 years ago

Hello @chrisroat!

I'm now in the process of automating our ETL pipeline for the Tye lab and will start using some of your code soon! The first thing I wanted to do is integrate Prairie View's image ripping utility correctly but I'm not sure how to change certain parameters of the program. I noticed in rip.py that there's a section written as follows:

    # Normally, the fname is passed to -AddRawFile.  But there is a bug in the software, so
    # we have to pop up one level and use -AddRawFileWithSubFolders.
    cmd += [
        ripper,
        '-IncludeSubFolders',
        '-AddRawFileWithSubFolders',
        str(dirname),
        '-SetOutputDirectory',
        str(dirname),
        '-Convert',
    ]

For now, I'm hoping to retain the Raw data until I'm comfortable that the pipeline works correctly. Does the cmd list here overwrite any defaults that the utility has? It seems that the tool by default has the "Delete Original Data after creating Images?" selected so I want to make sure that's not enabled.

Any advice?

Edit: I was also wondering if it's necessary to have all the Prairie View Utilities files installed on the machine as well of if having the Utility alone is sufficient. I'll be performing the ripping itself on a computing cluster and want to make sure I have everything the tool needs.

Thanks again for responding to me over email! I really appreciate your help and the work you've made available here. It's really going to help our system get started.

chrisroat commented 3 years ago

Hi!

We were thinking the same thing.

I've significantly rewrote a lot of the code. Here is the branch with all the updates. It now uses the option to keep the raw data around. https://github.com/deisseroth-lab/two-photon/tree/chrisroat-tiff2hdf

The README has more of the details. Of course, I suggest trying it all out on a small, practice dataset to see how things work for you.

As far as the software, we've included the necessary Prairie View code in this repo so you should not need to install the software. As proof, there is a Dockerfile that does run the ripping (via Linux+wine). However, it is very slow and I haven't figure out why yet. Likely we can build a Windows docker container for just the ripping (and use a Linux container for the rest of the pipeline), but I don't have much experience with that.

tbenst commented 3 years ago

Note that the version of software has to exactly match (ie 5.5 update 4) or Image ripping will choke (will raise error—not destructive from what I’ve seen).

@chrisroat ripping is fast for Linux + wine when on a fast, local ext4 filesystem. I’ve had problems with software RAID0 (fails) or Lustre (sloooooow), however.

chrisroat commented 3 years ago

The software is bundled with the repo, and we've only included 5.4 and 5.5 -- what we use in the lab. And yes, the python code raises an Exception before trying anything, if your data does not match one of these. If you find that you have a different version, we can either

@Tyler Stephen Benster @.***> - thanks for the info on RAID0/Lustre. Was your work using the built Docker or Singularity container, or your own setup? Have you tried on a Sherlock local scratch? If scratch works well on Sherlock, we can update the containers to (optionally) copy locally, rip, and copy output back.

C

On Fri, Jun 11, 2021 at 12:50 PM Tyler Benster @.***> wrote:

Note that the version of software has to exactly match (ie 5.5 update 4) or Image ripping will choke (will raise error—not destructive from what I’ve seen).

@chrisroat https://github.com/chrisroat ripping is fast for Linux + wine when on a fast, local ext4 filesystem. I’ve had problems with software RAID0 (fails) or Lustre (sloooooow), however.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deisseroth-lab/two-photon/issues/12#issuecomment-859262143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIBDYJY45MSGWUK5DABCMDTSGIXXANCNFSM46PUPELA .

jmdelahanty commented 3 years ago

Thanks for your responses! I'll be trying to get the container running for the lab today and test things out.

Another thing that might be of interest for you both is that Michael Fox at Bruker told me that in early July there's going to be a new Prairie View release to 5.6! I'm not sure how you can prepare for that, but apparently it's coming soon.

Edit:

ripping is fast for Linux + wine when on a fast, local ext4 filesystem. I’ve had problems with software RAID0 (fails) or Lustre (sloooooow), however.

Could you give a quick rundown of what you mean by a "local ext4 filesystem"? I'm not sure how our cluster/servers are set up yet. All I know is that we have 10gig lines connecting things which is promising.

Edit 2: It turns out I'm not allowed to get a Docker container running on our cluster yet, it's something that our IT needs to talk to me about first to set up. So in the meantime, I'm trying to run the ripper from just the Prairie View Utilities folder. I'm running into an exception where it says it can't find one of the .dll files that's in the same folder for some reason. Does the ripper not look in the same location for the associated .dll files? Or is that part of what the python script performs?

Edit 3: I've been looking at our installation of Prairie View on our microscope's computer and it says I'm using version 5.5 Update 3. so I'm upgrading to Update 4. I wouldn't have known a new upgrade was available unless you mentioned that @tbenst! Thanks! I guess I'll have to check their website more often for those updates.

tbenst commented 3 years ago

Could you give a quick rundown of what you mean by a "local ext4 filesystem"? I'm not sure how our cluster/servers are set up yet.

I would just benchmark it and if speed is ok wouldn’t worry :)

It turns out I'm not allowed to get a Docker container running on our cluster yet

very common, you’ll have to build the singularity container. See https://github.com/deisseroth-lab/two-photon/blob/master/Singularity

I'm running into an exception where it says it can't find one of the .dll files that's in the same folder for some reason.

I’ve seen the same error with prairie view 5.5 update 4. I believe this repo has 5.5 update 2 or 3. Unfortunately it seems we need to include the files for all minor versions

jmdelahanty commented 3 years ago

I would just benchmark it and if speed is ok wouldn’t worry :)

Got it! Sounds good to me.

very common, you’ll have to build the singularity container

The IT group just got back to me and now I have Docker privileges! So I'll try to do that today.

I’ve seen the same error with prairie view 5.5 update 4. I believe this repo has 5.5 update 2 or 3. Unfortunately it seems we need to include the files for all minor versions

Is this something I could help with somehow? If I was to copy the files you have in the Utilities from a computer with update 4 installed into a new folder would that be enough to get things working?

I was also wondering: does the version the data is acquired with require the same version ripper? As in if I acquired data with 5.5 update 3 but have updated to 5.5 update 4, I now can't rip the files I made?

chrisroat commented 2 years ago

You will have to ask Bruker about compatibility. In an ideal world, 5.5 updates 3 and 4 should be compatible. They may even guarantee the 5.6 ripper could support 5.5 data (i.e. newer software can still operate on older data) -- but I'm not sure.

By the way, which dll file is missing? Can we just add it to the repo?

jmdelahanty commented 2 years ago

Sorry for the delay in response Chris! I've been busy working on some other stuff. I apparently didn't write down the particular .dll file anywhere so I'll be trying to do this again this week and find out which one was missing...

jmdelahanty commented 2 years ago

Hey everyone!

In the next couple weeks I'll actually be collecting and processing some 2P data from our Bruker Scope and I've finally gotten around to installing your container onto our cluster! Unfortunately none of the Docker enabled machines have Wine installed on them, so I'm asking our IT group to install it on a machine for me. Hopefully they can do that soon. Thanks for your patience keeping the issue open.

By the way, which dll file is missing? Can we just add it to the repo?

Still need to look into this, once I have Wine installed on a machine I'll try installing the container and running it and see what happens.

Edit: Is the repo you linked above still the one I should use or is there a different branch I should be using?

chrisroat commented 2 years ago

Hey Jeremy,

I do not think you need to have wine installed. The great thing about containers is that they have all the necessary software installed. If you are able to build the container, you should be able to run the container. (And building could be done on your own computer, where you can install docker yourself.)

HTH, C

On Sun, Sep 26, 2021 at 3:30 PM Jeremy Delahanty @.***> wrote:

Hey everyone!

In the next couple weeks I'll actually be collecting and processing some 2P data from our Bruker Scope and I've finally gotten around to installing your container onto our cluster! Unfortunately none of the Docker enabled machines have Wine installed on them, so I'm asking our IT group to install it on a machine for me. Hopefully they can do that soon. Thanks for your patience keeping the issue open.

By the way, which dll file is missing? Can we just add it to the repo?

Still need to look into this, once I have Wine installed on a machine I'll try installing the container and running it and see what happens.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deisseroth-lab/two-photon/issues/12#issuecomment-927383225, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIBDYPEVNFAPSX4XNQ7YR3UD6NHHANCNFSM46PUPELA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jmdelahanty commented 2 years ago

Edit 3: In the spirit of your last advice on the NWB thread, I removed all the other comments/questions that weren't specifically relevant to this to follow best practices. I hope I haven't been annoying at all.

The container has been built and I think I'm super close!

Sorry for the spam, this will be the last edit for today (pending success which I'll update you about). Using runscript.sh seems to get me past the earlier things as well as encoding things with LF and not CRLF.

Currently I'm running into some different errors than what the README describes.

Here's where I'm at

0034:err:menubuilder:init_xdg error looking up the desktop directory
00b8:err:setupapi:create_dest_file failed to create L"C:\\windows\\system32\\ucrtbase.dll" (error=80)
010c:fixme:ver:GetCurrentPackageId (0000000009DEFE10 0000000000000000): stub
010c:fixme:iphlpapi:NotifyIpInterfaceChange (family 0, callback 0x48391e0, context 0x9af1e80, init_notify 0, handle 0x9defa30): stub
00b8:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION
0238:fixme:ver:GetCurrentPackageId (000000000031FDB0 0000000000000000): stub
libc++abi: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc

Any tips for crossing the finish line?

Thanks again for your patience and help Chris! This is all really cool and I'm excited to learn.

jmdelahanty commented 2 years ago

I'm happy to report some success! Thanks Tyler and Chris!

I've gotten the ripper to at least run without any errors, but the problem is that I think I might be running into the issue where things are still pretty slow on Linux/Wine (if that's still an issue for you that is!). I've let it run for a bit now and so far I'm getting zero tiff files. Interestingly when I look at htop on the computer, nearly all of the processors don't seem to be doing anything! Is this something that you have any experience with?

chrisroat commented 2 years ago

Yes, we've had similar reports. What kind of machine are you running on?

When I set it up, I mostly worked from a small file that ripped pretty fast (just a few tiff files, I think). If you have a larger test file you can share, we can investigate further.

jmdelahanty commented 2 years ago

Here's some specs since I'm not sure what would be most helpful: CPU

jdelahanty@cheetos:/snlkt/data/bruker_pipeline$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz
Stepping:              1
CPU MHz:               1200.500
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              5188.09
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              40960K
NUMA node0 CPU(s):     0-15,32-47
NUMA node1 CPU(s):     16-31,48-63

RAM

jdelahanty@cheetos:/snlkt/data/bruker_pipeline$ free -h
              total        used        free      shared  buff/cache   available
Mem:           251G        1.3G        119G        2.4G        130G        246G
Swap:           56G         80K         56G

Fileserver As far as I'm aware, it's all connected with 10Gb lines to our file directories.

OS

jdelahanty@cheetos:/snlkt/data/bruker_pipeline$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.13 (stretch)
Release:        9.13
Codename:       stretch

I do have a test set of 1k frames (I can collect a smaller one if that would be helpful), but I'm not sure how I'd share it!

chrisroat commented 2 years ago

What OS are you running? Is it your 1k test set that does not rip?

You could try sharing the rawdata file for the 1k test set on Google Drive or in a public cloud bucket.

C

On Mon, Nov 1, 2021 at 8:57 PM Jeremy Delahanty @.***> wrote:

Here's some specs since I'm not sure what would be most helpful: CPU

@.***:/snlkt/data/bruker_pipeline$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz Stepping: 1 CPU MHz: 1200.500 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 5188.09 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 40960K NUMA node0 CPU(s): 0-15,32-47 NUMA node1 CPU(s): 16-31,48-63

RAM

@.***:/snlkt/data/bruker_pipeline$ free -h total used free shared buff/cache available Mem: 251G 1.3G 119G 2.4G 130G 246G Swap: 56G 80K 56G

Fileserver As far as I'm aware, it's all connected with 10Gb lines to our file directories.

I do have a test set of 1k frames (I can collect a smaller one if that would be helpful), but I'm not sure how I'd share it!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deisseroth-lab/two-photon/issues/12#issuecomment-957079922, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIBDYMZY55MB7AALY35QKLUJ5OS3ANCNFSM46PUPELA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jmdelahanty commented 2 years ago

Updated the above comment for OS info! Here's a link to a Google Drive.

Oh also, yes it is the 1k test set that doesn't rip.

jmdelahanty commented 2 years ago

In the meantime while I continue trying things, is it possible to copy the ripper onto a Windows VM and run it like you would with a full prairie view installation?

Turns out the answer to this is yes!

I'm also trying to see if increasing the shared memory of the container would help solve it not running...

This did not solve things for the container, but I'm pretty sure it wasn't running because it didn't have the correct minor version of the ripper installed on it. I'll be trying to fix that next week...

chrisroat commented 2 years ago

Thanks for the results on using a Windows VM. Just to confirm -- were you able to rip the 1k test set that was failing via the Docker setup? And how did you run the Windows VM (was it via Docker?)?

Regarding the minor version, the repo has 5.4 and 5.5 of the ripper and your data dump seems to be done with 5.5. Since your Windows VM test worked, is it safe to assume the repo has all the necessary files? Can you elaborate on your statement about the minor version statement?

[FYI, this isn't my top priority at the moment, so I won't be able to make much progress in the short term. Happy to keep commenting.]

jmdelahanty commented 2 years ago

Thanks for the results on using a Windows VM. And how did you run the Windows VM (was it via Docker?)?

No problem! I had our IT Admin set one up on the cluster for me. I'm not experienced with setting them up so I had him help. As far as I know, the VM was not built with Docker. I think it was with Virtualbox. I can ask him and double check.

Just to confirm -- were you able to rip the 1k test set that was failing via the Docker setup?

Yeah! It worked just like it does on the local microscope's machine with the GUI and everything.

Regarding the minor version, the repo has 5.4 and 5.5 of the ripper and your data dump seems to be done with 5.5. Since your Windows VM test worked, is it safe to assume the repo has all the necessary files? Can you elaborate on your statement about the minor version statement?

Sure! I'm not familiar with how to properly talk about the different versions available in software, so I'll explain what I meant.

I’ve seen the same error with prairie view 5.5 update 4. I believe this repo has 5.5 update 2 or 3. Unfortunately it seems we need to include the files for all minor versions

This is what I had meant, that it didn't have the correct "Update" Version available. I'm not sure which one is present in the repository specifically, but it looks like having a Utilities folder for each updated version is what will be required. These updates copy the outdated Prairie View setup into a new folder called Prairie 5.5.64.VERSION on our microscope's computer.

Here's the versions I have on our machine at Salk:

It appears that each one of these sub-updates has it's own version of a file called daq_int.dll in the Utilities Folder if you try to run the ripper on data generated with a sub-version that doesn't match (i.e. 5.5.64.400 ripper on 5.5.64.500 data), an error is raised stating this: bruker_ripper_error

If you match the version of the Utilities folder with the version in the .env file for the images, the ripper works! I haven't put this version into the Docker container yet, but I'm planning on doing that today or tomorrow so I can make sure that's why the test wasn't running properly.

So @tbenst was right! It looks like the repository needs to include each of these different versions at a more specific version.

Would it be helpful to copy these versions in a pull request? I don't want to do that if it'll get in the way of anyone. The only thing is that we updated from 5.5 Update 5 to 5.6 Update 1. We don't have a copy of 5.5 Update 6 here. I'm not sure how we'd get that, maybe we can ask Bruker for it.

[FYI, this isn't my top priority at the moment, so I won't be able to make much progress in the short term. Happy to keep commenting.]

No problem! All of your help has been valuable and your guidance over comments is great.

chrisroat commented 2 years ago

Just to confirm -- were you able to rip the 1k test set that was failing via the Docker setup?

Yeah! It worked just like it does on the local microscope's machine with the GUI and everything.

Were you using this repo? Or did you use a more recent ripper? If the latter, then we should not be surprised it worked. The test verified that a Windows VM is the same as a workstation -- which it should be.

What I was hoping you found out was that this repo worked on a VM. But from the version mismatch errors you see below, I assume this is not true.

If you match the version of the Utilities folder with the version in the .env file for the images, the ripper works! I haven't put this version into the Docker container yet, but I'm planning on doing that today or tomorrow so I can make sure that's why the test wasn't running properly.

So @tbenst was right! It looks like the repository needs to include each of these different versions at a more specific version.

Very cool! I'm not sure why they break forward compatibility with an update release. It makes me sad, but perhaps they have documented this somewhere.

If we are lucky, Bruker will respect backward compatibility: the latest ripper update for a minor version may be able to read data produced by older acquisition system updates for the same minor version. (i.e. 5.5 500 hopefully can read 5.5 400).

If this is the case, we just have to keep the repo/image updated with the most recent update for a given minor version.

Would it be helpful to copy these versions in a pull request? I don't want to do that if it'll get in the way of anyone. The only thing is that we updated from 5.5 Update 5 to 5.6 Update 1. We don't have a copy of 5.5 Update 6 here. I'm not sure how we'd get that, maybe we can ask Bruker for it.

If you can test the most recent update for a minor version works, you can replace the 5.4 and 5.5 folders with the most recent update you have. You can include whatever latest update you have for 5.6 as well.

If you put something on a branch and create a PR, it won't get in anybody's way.

tbenst commented 2 years ago

If we are lucky, Bruker will respect backward compatibility: the latest ripper update for a minor version may be able to read data produced by older acquisition system updates for the same minor version. (i.e. 5.5 500 hopefully can read 5.5 400).

Unbelievably, this is not the case. Must be an exact version match. A real PITA

jmdelahanty commented 2 years ago

In the meantime until I try out things in the container/give a more thorough set of answers to Chris, here's a message I received from Michael Fox from Bruker about this versioning requirement a couple weeks ago:

"5.6 like all versions before it will only rip data sets acquired in the same version. The reason being that we want the code in daq_int.dll to match exactly to the version the data was acquired with. When that does change it will likely be because we make larger changes to convert the data in real time and just get rid of the offline conversion process , entirely. There was a time in Prairie View prehistory when this wasn’t enforced and it was possible to just shred the data depending on which two versions were used."

chrisroat commented 2 years ago

[edited when I realized the version is actually 4 pieces]

Michael's statement is weaker than what we've actually found, so it might be worth confirming what we've found. He is saying the A.B release number is all that matters, but we are finding that A.B.C.D must match exactly (C always is 64?).

Maybe it's safest just to check in every minor version we have and include the full version in the directory name (right now it is just X.Y). We also will need to update the regex to expand the version to include A.B.C.D:

https://github.com/deisseroth-lab/two-photon/blob/39afef0c3f2391a64e7c28aa0f1e7ab5555f5fee/two_photon/raw2tiff.py#L177-L185

jmdelahanty commented 2 years ago

Edit:

~I'm not sure what I did, but this is now fixed it seems... I'll update here with what I find about how I used it properly.~

The reason for this was that I needed to use the runscript.sh file to actually initiate the ripping process. Without this, the Wine environment isn't copied before hand and that leads to the crash I ran into below. At least I think that's the case...

I tried to initiate a PR that has some of those small changes but have no idea if I did things properly/helpfully! In the meantime, I'm running into something new which is encouraging.

~When I start the ripping process, I get this error:~

jdelahanty@bruker-ripper:~$ source /apps/runscript.sh /tmp/specialk_cs/CSC000/test_raw/

Copying wine environment.

Executing rip. One err and four fixme statements are OK.

2021-11-15 16:04:48.373 rip:43 INFO Data created with Prairie version 5.5.64.500, using ripper: /apps/prairie_view/5.5.64.500/Utilities/Image-Block Ripping Utility.exe
2021-11-15 16:04:48.377 rip:70 INFO Ripping from:
 /tmp/specialk_cs/CSC000/test_raw/Cycle00001_Filelist.txt
 /tmp/specialk_cs/CSC000/test_raw/CYCLE_000001_RAWDATA_000000
2021-11-15 16:04:48.383 rip:117 INFO Watching for ripper to finish for 3600 more seconds
000d:err:menubuilder:init_xdg error looking up the desktop directory
0026:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
0026:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly.

~After this, it looks like the ripper hangs until I ctrl-c out.~

~I'm using the Dockerfile in the repo, so I'm guessing something else about my setup is going wrong. Any advice?~

chrisroat commented 2 years ago

The final 2 err statements are not normal. It shouldn't be trying to create a window.

I see the pull request and will take a look later.

jmdelahanty commented 2 years ago

Over the past couple days I've discovered a new problem with ripping in the container. Basically, Prairie View outputs XML that's malformed. The file format they declare is v1.0 in the .env files themselves, but apparently if you use their Voltage Output features for stimulation (which is what we've done here), they insert characters that aren't supported until XML v1.1. When using the standard ElementTree parser, it crashes upon running into a specific line of the .env file. I discovered this while trying to grab information out of the .env file when generating NWB files from it and getting a ParseError exception that stops things.

The workaround I found was to use the lxml library which has an option for recover=True in the parser and it will just skip past the unsupported character and read in the file's root which allows you to grab whatever info you need. So far, the line that throws the error hasn't been important.

Strangely, even if you include lxml in the dependencies in the environment.yml file used in the Dockerfile, you run into a situation where Python doesn't recognize the library when you execute rip.py from runscript.sh. If you run it just via Python, it does recognize the lxml library but the rest of the multiprocessing step fails I'm guessing because the copying of the Wine environment hasn't taken place nor has the enabling of xvfb. I've been messing around with it for a couple days without luck so far, but will keep the thread posted as I move along.

Another thing I have discovered is that, at least on our cluster system here at Salk, we see similar performance for the Windows VM I described before and the Docker container in terms of conversion speed for smaller datasets (less than 10k images). Since I'm still fighting the issue I described above with Docker I haven't had a chance to test a large (about 43k images) conversion there. On the VM, however, performance with large datasets keeps up with local performance for about 20k images and then starts slowing down to an eventual crawl. The VM has hardware that's superior to our local machine funny enough. I've been working with our IT admin to see if I can resolve this slowdown, but there's no obvious cause that we have found yet. I figured I'd share that info as well. I'm in contact with Michael at Bruker about this to see what he says, but won't hear back probably until after this weekend I'm imagining due to Thanksgiving.

In the meantime, I'm going to try to run the ripper locally on the microscope's machine and then concatenate the OME.tiffs to H5 using the code in your repo and then copy stuff to our server with shutil at the end of the day. Hopefully I can properly unlock the power of our cluster here soon...

chrisroat commented 2 years ago

I've found Bruker output isn't always adhering to spec, which is too bad. It's also too bad they use v1.1 features, since (as you found out), most python parsers don't implement 1.1.

One thing I noticed that runscript calls the two-photon binary, but that was renamed to 2p. This is a mistake, and I'm surprised it is working for you. What branch of the repo are you using (main is the supported branch going forward, but master is still around for people that are still using it)?

With the Windows VM, my understanding is that you are using a Prairie View install and not the code in this repo? If so, that might be getting us closer to the root cause. Hopefully Bruker (or your IT team) can shed light on what the program is doing that won't work in a container.

jmdelahanty commented 2 years ago

Hey Chris! I have some interesting updates as well as answers to your questions:

I've found Bruker output isn't always adhering to spec, which is too bad. It's also too bad they use v1.1 features, since (as you found out), most python parsers don't implement 1.1.

Definitely. It seems like they only output v1.1 some of the time. When we weren't using voltage outputs, which drive the stimulations we've implemented, we had no issues with the XML version.

One thing I noticed that runscript calls the two-photon binary, but that was renamed to 2p. This is a mistake, and I'm surprised it is working for you. What branch of the repo are you using (main is the supported branch going forward, but master is still around for people that are still using it)?

I'm calling just rip.py in the script I've been testing which isn't part of the PR I started. Since I wasn't sure if it made sense to include that small of a thing in it I figured I'd leave it out.

With the Windows VM, my understanding is that you are using a Prairie View install and not the code in this repo? If so, that might be getting us closer to the root cause. Hopefully Bruker (or your IT team) can shed light on what the program is doing that won't work in a container.

We basically just copied the ripper utility to the VM, navigate into the Utilities folder, and double click the ripper. I tried it so far with the GUI and it runs just as fast as on the local Windows machine until about 20k images where it starts to slow down to a crawl. Still stuck on getting lxml imported properly in the container after it copies the wine environment.

Speaking of Wine environments, whenever I create a new Image, a temporary folder is generated for Wine wherever I try to run the actual ripping script. Any idea how to specify where these intermediate/temporary folders are stored when I create images?

I talked with Michael over email today and he gave me some more information about what the ripper is really up to for a few things. He gave me the following information:

  1. Memory Allocation

For the ripper to work, it must allocate 4 bytes per channel per pixel of memory to the process. If the ripper can't find enough contiguous memory when you call it, an error is raised that requires a restart of the computer. Memory is allocated just once per dataset, meaning once per Filelist.txt by a call to a convert function in the daq_int.dll library and then freed afterwards.

  1. Processor Use

The ripping utility is single core and essentially operates as a for loop iterating through the raw files on found in the Filelist.txt. Therefore, allocating additional cores to an individual instance of the ripper should not impact performance. As of 11/30/21, it has been found that the ripper does sometimes appear to use more than one core while operating, but it's possible that this is for the actual processing of reading and writing to/from disk. Michael says that " it’s basically just a giant (single threaded) loop reading/processing the raw data and writing out TIFF files."

  1. File I/O It's possible the reason for the slowdown has to do with something occurring not during the ripper's processing but rather file I/O speeds being insufficient for what the ripper expects. I'm testing with our IT to see if we can check what the VM is up to with some logging utilities while it's ripping to see if there's some kind of back-log that keeps the computer busy somehow. I'm also going to test a new VM that our IT is setting up that is hosted on a machine that has the local file system mounted.

  2. Virtualization and its consequences (Not from Michael)

It's possible that the software hooks used in Wine or in the virtualization process are causing some kind of fundamental issue with how Bruker's code actually executes. Our IT lead has mentioned that it's possible this is the reason things act strangely at some point. He also said that it is still strange that things work normally at first and then slow down, but virtualization can basically be weird is what he said.

Today I tested a different Windows machine (I'm pretty sure it's not a VM...) that is connected to the local file system and has a RAID0 NVMe SSD available and it works just as well as locally.

Lastly, I found out that many of the files currently on the repository for Bruker's Utilities are unnecessary for the ripper. All that is required is the daq_int.dll file and the ripper executable! I tested that a couple times today and it worked well. I'll update my PR to include that change.

tbenst commented 2 years ago

Re the terrible ripping speeds on large files, I’m afraid I can report the same behavior even on windows when ripping on NVME drives. It appears that the ripper has quadratic performance with number of tiffs. I have reported this issue multiple times to Bruker with no resolution. I would encourage you to report the issue as well.

jmdelahanty commented 2 years ago

Dang, good to know about for sure that I'm not the only one seeing it... Thanks for the info Tyler! I mentioned it to Bruker and they basically said that the ripper shouldn't really experience a slow down unless there's some kind of file I/O issue or disk writing access issue.

Interestingly when I ran the ripper on a recording of 53k images today on a machine in the cluster it didn't seem to suffer any slowdown. It also never peaked the processors that were available and didn't use much RAM at all. When I ran the same recording on the local machine that collected the images it doesn't look like it slows down either. There's definitely something strange going on sometimes and I'm wondering if it's that file I/O isn't consistent on networks even if they're sufficiently fast. I don't know enough about CS/hardware to say anything about that really. I'll keep you updated as I test things out tomorrow on our VM.

chrisroat commented 2 years ago

Sounds like some good progress toward defining the problem. We have setups that work, and setups that don't. We should isolate what exactly the issue is, and try just changing one thing at a time in moving between "works" to "not works" (or vice versa). I don't know the exact setups, but I'll throw out some ideas you can try.

I do think it's best to always be using a local filesystem. It's possible that the non-local filesystem is slow, and just has a local cache of about 20k images that fills up quickly. This would make it look like it writes the first 20k images fast, but it may not be. Can you take the setup that isn't working, and check if it's a local file system -- and if not, use the local filesystem?

For this working setup:

Today I tested a different Windows machine (I'm pretty sure it's not a VM...) that is connected to the local file system and has a RAID0 NVMe SSD available and it works just as well as locally.

can you try some experiments:

For the following:

Interestingly when I ran the ripper on a recording of 53k images today on a machine in the cluster it didn't seem to suffer any slowdown.

jmdelahanty commented 2 years ago

Sorry for the delay in getting back to you! Been recovering from a COVID booster and busy imaging a bunch of mice over the the past couple days.

Can you take the setup that isn't working, and check if it's a local file system -- and if not, use the local filesystem?

This set up was not connected to the local file system. Our IT is currently troubleshooting a problem with our "scratch space" used for testing out different things that I expect to be completed soon (hopefully today). Once that's ready I'll try it out again and report back.

try using the ripper via Docker container, and have the direct install of Prairie View removed from the machine? [Tests whether Docker is interfering]

Our IT admin needs to set up Docker on this computer for me to try this out. I asked him about it on a ticket already but he hasn't had a chance to look it over yet.

I also don't think I was clear before, this machine doesn't have a full direct installation of Prairie View on it. All it has are these two files:

try using the filesystem that was used in the slow setup

Will also see if this is possible pending IT.

For the following:

Interestingly when I ran the ripper on a recording of 53k images today on a machine in the cluster it didn't seem to suffer any slowdown.

How was it run?

This machine is the working setup I described above. I ran it using the Image Block-Ripping Utility.exe from a folder containing just the executable and the daq_int.dll.

Is it a VM?

This one is not a VM.

Is it a NVME disk?

Yes, NVME RAID0.

Is it a local filesystem?

Yes, it's on the local file system.

chrisroat commented 2 years ago

Hey all. This is my last week working in lab. I can still continue to comment here, but my help will be limited.

My best thought given all this uncertainty is to add code that can optionally copy data to a local disk, and then copy the results from local disk back out to the main cluster file system.

jmdelahanty commented 2 years ago

Good luck on your next adventure Chris!

I'm still waiting for my IT manager to get me a VM on the local file system unfortunately so I can test it out and I found out today that the machine I was running the container on is NFS mounted to the directory I'm writing to. I think I'm at the mercy now of him giving me a writable drive for this purpose on a local filesystem...

Edit:

The compute I'm using for this case has a local file disk that I'm allowed to use and the container runs without slowing down! It looks like the issue truly is this NFS file mount situation!