Closed yarikoptic closed 1 year ago
@yarikoptic
on drogon have deployment of https://github.com/datalad/dandisets
Are you saying that you want to use the pre-existing deployment at /mnt/backup/dandi/dandisets/
or that you want a new clone set up somewhere?
What user should this be set up under? How often should the cronjob run?
Also, I'm not sure if using fuse is going to help much, as the NWB files will still be cached locally.
@yarikoptic I do not appear to have write permission to this repository, and I cannot fork it because it is empty.
@yarikoptic
on drogon have deployment of https://github.com/datalad/dandisets
Are you saying that you want to use the pre-existing deployment at
/mnt/backup/dandi/dandisets/
or that you want a new clone set up somewhere?
new clone, e.g. /mnt/backup/dandi/dandisets-clone/
What user should this be set up under? How often should the cronjob run?
dandi
user, e.g. daily
Also, I'm not sure if using fuse is going to help much, as the NWB files will still be cached locally.
through fuse we would cache only needed blocks, not entire nwb. From prior experience -- it would take only a few MBs of cache to open multi GB file. And we want to abstract away ROS3 and/or fsspec -- just have rudimentary open
operations, thus -- datalad-fuse sounds like a good fit here.
@yarikoptic I do not appear to have write permission to this repository, and I cannot fork it because it is empty.
you are the boss now! ( I will rename into https://github.com/dandi/dandisets-healthstatus for consistency with older https://github.com/dandi/nwb-healthstatus )
@yarikoptic Exactly what code should be used for opening NWB files with pynwb and matnwb? For pynwb, should it just be this?
with pywnb.NWBHDF5IO(path, "r", load_namespaces=True) as fp:
pass
Should load_namespaces=True
even be there?
@yarikoptic Should nassets
in status.yaml
only count NWB assets or all assets?
@yarikoptic Exactly what code should be used for opening NWB files with pynwb and matnwb? For pynwb, should it just be this?
with pywnb.NWBHDF5IO(path, "r", load_namespaces=True) as fp: pass
Should
load_namespaces=True
even be there?
For starters let's stick to minimal default set of options, as none, thus
with pywnb.NWBHDF5IO(path, "r") as reader:
nwbfile = reader.read()
assert repr(nwbfile)
assert str(nwbfile)
which mimics our test_pynwb_io but without load_namespaces=True
. But WDYT @bendichter -- should it be like this or we should start with load_namespaces=True
?
@yarikoptic
in each dandiset folder have subfolder logs with
{last_run}_{test.name}_errors.log
with output from errored out runs. That subfolder should be git ignored for now
Exactly what do you mean by "errored out runs"? Failed tests? Update processes that exited nonzero?
@yarikoptic Should
nassets
instatus.yaml
only count NWB assets or all assets?
all. eventually I hope we would add tests for zarrs as well.
Exactly what do you mean by "errored out runs"? Failed tests? Update processes that exited nonzero?
note: each execution of the test should be a separate python
or matlab
run (due to all the pynwb versions schema treatments etc). So it would be for failed tests. for failed updates, after which there is no point to run tests BTW if it fails -- use {last_run}_updates_errors.log
.
I think we should test with load_namespaces=True
. Otherwise, this will fail for any NWB file that contains an extension.
@yarikoptic So you want each test to run in a separate process? Are you saying that you want the output from these processes to just be run together in a single logfile per update per Dandiset?
@yarikoptic So you want each test to run in a separate process?
yes
Are you saying that you want the output from these processes to just be run together in a single logfile per update per Dandiset?
yes
I think we should test with
load_namespaces=True
. Otherwise, this will fail for any NWB file that contains an extension.
eh, ok. @jwodder let's then start with pynwb_open_load_ns
(not just plain pynwb_open
) which would have load_namespaces=True
;-)
@jwodder I can answer questions you have for MatNWB.
To answer your question for
Exactly what code should be used for opening NWB files with ... matnwb?
Using
nwb = nwbRead('filename.nwb', 'savedir', '.');
should be enough to test. savedir
sets the location to save the generated namespace modules and should be cleared before reading every nwb file to cleanly separate generated class environments.
@lawrence-mbf I assume that nwbRead
is imported with from matnwb import nwbRead
? What is the meaning of the third argument to that function?
@jwodder
I assume that "nwbRead" is imported with "from matnwb import nwbRead"?
There is no such syntax in MATLAB so it will really depend on what FFI you're going to end up using here in your codebase. In MATLAB all you would have to do is have the nwbRead.m
script file in your MATLAB path.
What is the meaning of the third argument to that function?
savedir
is a keyword argument so it's implied to be paired with the third argument which is just indicating the current MATLAB directory.
@yarikoptic I was under the impression that there was a Python API for matnwb. How exactly should this test be run?
@jwodder I believe the simplest way is to run bash commands through python. Another alternative is the MATLAB engine API for Python
@yarikoptic I was under the impression that there was a Python API for matnwb. How exactly should this test be run?
I thought just via external command execution like matlab -nojvm -nodesktop -noflash testcode.m
@yarikoptic
testcode.m
be? Just @lawrence-mbf's suggestion of nwb = nwbRead('filename.nwb', 'savedir', '.');
(but with '.'
replaced with a per-asset temporary directory)?testcode.m
be replaced with -
to read from standard input?
- What exactly should the contents of
testcode.m
be? Just @lawrence-mbf's suggestion ofnwb = nwbRead('filename.nwb', 'savedir', '.');
(but with'.'
replaced with a per-asset temporary directory)?- Can
testcode.m
be replaced with-
to read from standard input?- How exactly should the cronjob update to the latest version of matnwb?
@lawrence-mbf could you please provide @jwodder with a sample complete usable testcode.m and the way to invoke it with matlab without requiring GUI or interaction, and the preferred way to update matnwb to the most recent released version?
@jwodder
In fact, you don't even need to run a script per se to test reading. In bash you could just run:
matlab -batch "nwb = nwbRead('file.nwb', 'savedir' '.');"
though you could replace everything in quotes with just "testcode" if you wish to place it in a function.
matlab -batch "testcode('filename.nwb');"
and testcode.m containing:
function testcode(filename)
nwb = nwbRead(filename, 'savedir', '.');
end
MATLAB will even throw an error code if it errors out.
As for getting the most recently release version, I believe the easiest would be to query Github using this API: https://docs.github.com/en/rest/releases/releases#get-the-latest-release
And grabbing the files that way.
@lawrence-mbf What if I want to pass the NWB path and savedir path as extra arguments on the command line rather than having to mess with interpolation into a MATLAB code string? How would I do that?
And grabbing the files that way.
And then what?
@yarikoptic Should datalad-fuse mount the FUSE directory in a specific path? Is it safe to mount FUSE at a temporary directory under /tmp
?
@jwodder
What if I want to pass the NWB path and savedir path as extra arguments on the command line ...
Then I suppose you would have to use a config file that you should export in the same directory as the script. Something like this testcode.zip
Inside the above zip file is a testcode.m
file which should be run without arguments (see below). In the same directory should be the testcode_config
file which is just an ASCII file with two lines of data (separated by newlines) in order of:
1) the path to the file
2) the savedir
directory.
There's an example testcode_config
provided. Hopefully that works around the need to interpolate strings.
And then what?
Assuming you have MATLAB installed you would:
1) unzip the files in some directory
2) add the directory to the MATLABPATH environment variable
2) call the command line command matlab -batch "testcode" -sd <testcode-dir>
.
@yarikoptic Should datalad-fuse mount the FUSE directory in a specific path?
May be let's use some "project" top level directory like /mnt/backup/dandi/dandisets-healthstatus/
where we would have dandisets
(the aforementioned clone of all dandistes), and dandisets-fuse
mount point to be reused.
Is it safe to mount FUSE at a temporary directory under
/tmp
?
it might work but probably best to keep things near.
BTW -- matlab
is now available on drogon
(base) dandi@drogon:~$ matlab -nojvm -nodesktop
MATLAB is selecting SOFTWARE OPENGL rendering.
< M A T L A B (R) >
Copyright 1984-2022 The MathWorks, Inc.
R2022b Update 1 (9.13.0.2080170) 64-bit (glnxa64)
September 28, 2022
For online documentation, see https://www.mathworks.com/support
For product information, visit www.mathworks.com.
>>
I can't get matnwb to work. I've downloaded & unpacked the source code for the latest release in matnwb-2.5.0.0
. If I try running:
export MATLABPATH="$PWD"/matnwb-2.5.0.0
matlab -nojvm -nodesktop -r 'matnwb_nwbRead sub-RAT123.nwb foo'
where matnwb_nwbRead.m
contains:
function []=matnwb_nwbRead(a,b)
nwb = nwbRead(a, 'savedir', b);
end
then I get:
Unable to resolve the name 'types.core.Subject'.
Error in io.parseGroup (line 85)
parsed = eval([Type.typename '(kwargs{:})']);
Error in io.parseGroup (line 38)
subg = io.parseGroup(filename, group, Blacklist);
Error in io.parseGroup (line 38)
subg = io.parseGroup(filename, group, Blacklist);
Error in nwbRead (line 59)
nwb = io.parseGroup(filename, h5info(filename), Blacklist);
Error in matnwb_nwbRead (line 2)
nwb = nwbRead(a, 'savedir', b);
If I try running the commands listed here, then at the generateCore()
step I get:
Unable to resolve the name
'com.mathworks.jmi.ClassLoaderManager.getClassLoaderManager'.
Error in javaclasspath>local_javapath (line 137)
jloader = com.mathworks.jmi.ClassLoaderManager.getClassLoaderManager;
Error in javaclasspath (line 65)
p = local_javapath('-dynamic');
Error in javaaddpath (line 71)
javaclasspath( p, javaclasspath );
Error in spec.loadSchemaObject (line 9)
javaaddpath(java_loc);
Error in spec.generate (line 6)
Schema = spec.loadSchemaObject();
Error in generateExtension (line 47)
Namespaces = spec.generate(namespaceText, localpath);
Error in generateCore (line 51)
generateExtension(commonPath, varargin{:});
@yarikoptic
update full hierarchy: run
datalad update -d dandisets --follow parentds --how=ff-only -J5 -r -R 1
That command fails with "unknown argument: -J5".
@jwodder Please remove -nojvm
, MatNWB does leverage the JVM for YAML parsing.
@yarikoptic Exactly what command do I run to create the clone of all Dandisets without cloning any Zarrs? Note that just datalad clone ...
is insufficient, as it leaves the subdatasets empty, and thus they remain empty in the FUSE mount.
should be able to use datalad install -r -R 1
BUT I think we better
datalad clone
which doesn't do recursivedatalad update
do datalad get -d . -r -R 1 -J5 -n
to ensure that all dandisets are installed, since update
would not install new subdatasets anyways IIRC -- would only update the ones already installed.update full hierarchy: run
datalad update -d dandisets --follow parentds --how=ff-only -J5 -r -R 1
That command fails with "unknown argument: -J5".
sorry about that -- apparently not yet implemented :-/ https://github.com/datalad/datalad/issues/6351 so no -J
for update
@yarikoptic datalad-fuse keeps crashing with errors of the form:
Uncaught exception from FUSE operation open, returning errno.EINVAL.
Traceback (most recent call last):
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fuse.py", line 734, in _wrapper
return func(*args, **kwargs) or 0
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fuse.py", line 834, in open
fi.fh = self.operations('open', path.decode(self.encoding),
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/datalad_fuse/fuse_.py", line 75, in __call__
return super(DataLadFUSE, self).__call__(op, self.root + path, *args)
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fuse.py", line 1076, in __call__
return getattr(self, op)(*args)
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/datalad_fuse/fuse_.py", line 203, in open
fsspec_file = self._adapter.open(path)
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/datalad_fuse/fsspec.py", line 243, in open
return dsap.open(relpath, mode=mode, encoding=encoding, errors=errors)
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/datalad_fuse/fsspec.py", line 169, in open
return self.fs.open(url, mode, **kwargs)
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 406, in <lambda>
return lambda *args, **kw: getattr(type(self), item).__get__(self)(
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fsspec/spec.py", line 1034, in open
f = self._open(
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 406, in <lambda>
return lambda *args, **kw: getattr(type(self), item).__get__(self)(
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 345, in _open
self.save_cache()
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 406, in <lambda>
return lambda *args, **kw: getattr(type(self), item).__get__(self)(
File "/home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 161, in save_cache
cached_files = pickle.load(f)
EOFError: Ran out of input
please file/clarify with https://github.com/fsspec/filesystem_spec/issues , smells like a bug on their end. Is that file entirely empty or has smth? is it open by some process still writing to it etc.
@yarikoptic Issue filed: https://github.com/fsspec/filesystem_spec/issues/1107
THANKS! You wrote
I suspect the cause is that the cached filesystem is not safe for concurrent access, with the result that multiple actions on the filesystem are causing the cache file to be read by one procedure while another writes to it.
I see that we are already doing some thread locking in https://github.com/datalad/datalad-fuse/blob/HEAD/datalad_fuse/fuse_.py#L214 but I guess, as a workaround (unless fixed in fsspec quickly) we might need to add that locking in the else:
part there at https://github.com/datalad/datalad-fuse/blob/HEAD/datalad_fuse/fuse_.py#L221 ?
@yarikoptic I don't think putting a lock just there would be enough. We'd likely have to wrap a lock around every operation that goes through fsspec.
ok, let's give fsspec folks a day but then we would need to try it out to progress forward. It seems that there is not that many of such code points anyways.
@yarikoptic I just finished running the healthchecks using https://github.com/datalad/datalad-fuse/pull/83 and https://github.com/fsspec/filesystem_spec/pull/1111. The good news is that there was only one error, caused by an HTTP problem, and it didn't seem to affect anything. Also, all the tests passed (and I know the test-handling is correct because some datalad-fuse issues caused some test failures in a previous run). The bad news is that the script took over three days to run, largely because iterating over a datalad-fuse directory involves the kernel making a stat(2)
call on each file, which means a getattr
call, which means making an HTTP request for each file.
This is indeed good news and awesome! Thank you @jwodder !
https://github.com/fsspec/filesystem_spec/pull/1111 -- lets finalize it! (docstring + take out of the draft)
https://github.com/datalad/datalad-fuse/pull/83 -- same there, left a comment
how much data was cached locally?
3 day time run is indeed "suboptimal", and it is great that you pin pointed the main slow down. Filed a dedicated https://github.com/datalad/datalad-fuse/issues/84 for it.
Let's consider this issue resolved, thanks again @jwodder, and I filed dedicated #2 and #4 for the next steps on this one.
@yarikoptic
how much data was cached locally?
The total size of all */.git/datalad/cache
directories in the dandi/dandisets clone is 949,177,524 kB.
@lawrence-mbf is there some more lightweight way to do basic "open nwb testing" with matnwb? In our preliminary testing in https://github.com/dandi/dandisets-healthstatus/pull/3#issuecomment-1324029002 nwbRead
might be too heavy to allow for scalable testing.
Not currently. It might be worth profiling to see what the bottleneck here is. I'm not familiar with what FUSE is but if it's using ROS3 it's most likely that driver.
that timing, if I got it right, was on a fully downloaded file... a sizeable though (11GB). It is just that it indeed takes that while!
I'll take a look if something can be done.
datalad update -d dandisets --follow parentds --how=ff-only -J5 -r -R 1
(with-R 1
we should not get to upgrade zarrs subdatasets for now)pynwb_open
: operate on .nwb file and open using pynwbmatnwb_nwbRead
: operate on .nwb file and open using matnwblogs
with{last_run}_{test.name}_errors.log
with output from errored out runs. That subfolder should be git ignored for now