Closed luiztauffer closed 12 months ago
I've seen this before. It's a tough one because it seems to be internal to h5py. May be an issue with remfile somehow.
@magland can we optionally use fsspec (remfile being the default) in order to test that?
@magland can we optionally use fsspec (remfile being the default) in order to test that?
That's a good idea. Would be nice to isolate a reproducible example of error outside of dendro job.
I will add option to use fsspec as an alternative to remfile (as discussed in our meeting).
@luiztauffer when you have a chance, could you share with me a link to the project where this happened? Hopefully this is reproducible and we can track down the problem.
EDIT: nvm, I found it.
@luiztauffer I realized something important in this example. The elapsed time was 3602.975 seconds, which almost certainly means that the issue was caused by an expired download url for an embargoed dandiset. I assume (hope) that this was running using a ks2.5 app that was built before I made modifications to enable auto-renewing of the download url.
On a related topic for this dataset... the download was taking a very long time because the chunking of the elec series is very inefficient. The data need to be re-uploaded with the latest chunking settings in neuroconv.
I assume (hope) that this was running using a ks2.5 app that was built before I made modifications to enable auto-renewing of the download url.
Ok let's hope so! I'll try that again later with the latest version of the App
On a related topic for this dataset... the download was taking a very long time because the chunking of the elec series is very inefficient. The data need to be re-uploaded with the latest chunking settings in neuroconv.
Yes, we should re-upload this with the improved chunking format, but in general we shouldn't always count on the chunking of all files to be made efficiently. Can we have an eager_loading
option for these apps as well? Would it make sense to have this as a feature of InputFile?
Ok let's hope so! I'll try that again later with the latest version of the App
I wouldn't try it with this example. It took 1 hr to download only 10% of the data. I think we need to assume reasonable chunking.
Yes, we should re-upload this with the improved chunking format, but in general we shouldn't always count on the chunking of all files to be made efficiently. Can we have an
eager_loading
option for these apps as well? Would it make sense to have this as a feature of InputFile?
What do you mean eager_loading
? Do you mean downloading the entire .nwb file up-front? We can do that, but do you think it should be a parameter of the processor?
yes, downloading the whole file instead of streaming, maybe eager is not the best word there. Yes, a parameter of the processors, but the download code could be a feature of InputFile, if that makes sense?
Adding this note here. Another reason we don't want eager (pre-download) option to be the default is that we want to be able to efficiently process a time segment of the dataset.
@luiztauffer I tried to run this again with the pre-download option, and here's what happened
Download took 2.7 hours (160 GB)
Then the next step was creating a int16 binary recording file by extracting the electrical series from the .nwb file. This only got to 67% complete when the total of 5 hrs timeout expired for the job. The reason it was so slow, even when the file was on the disk, is because the chunking issue also applies to reading locally.
We can discuss this more, but I think the bottom line is, it's very important to ensure that the electrical series have sensible chunking.
problem seems to have been solved with proper chunking of data at conversion step!
Job id: 7ac261d9
error track:
@magland