Open NileGraddis opened 5 years ago
adding @rly to the convo as well. He's new to the dev team and has already been very helpful with external link issues.
I'd say these are both jobs for external links. I'd like to build infrastructure around them so they work for you in both of these use-cases. Let's hone in on the precisely failure modes that are preventing you from using them in this way and see if we can resolve them for you.
Also I believe we have a more robust solution in the works that would interface directly with the AllenSDK, but that's further down the road.
We should talk some more about this during the hackathon. I agree that 1 would be the least optimal option. 2 i think should be doable since the main problem is to have a work-around for dealing with broken external links in PyNWB. 3. is something that we have on the roadmap. The idea is support "foreign fields" (basically web-based external links). We have not started work on this yet, but we plan to do some planning for this during the hackathon. As such, I can't give you an exact timeline for this yet, but I would hope for something in the 6-12month timeframe, but we'll have to see.
I think it would make sense to look at fixing 2 during the hackathon to get you going for now and at the same time start planning the roadmap for 3.
@bendichter @oruebel Thank you for the swift response. I like the idea of fixing 2 soonish (hackathon time sounds good) and working on 3 longer-term. I will post an example file and some code in this thread.
@rly Hi!
working on 2 now.
@oruebel @ajtritt @rly
I made some progress on this during the hackathon, but mainly in the direction of running into more problems :P. Here is what I've tried (the overall goal is to store big lfp data in satellite files):
Both of these require subclassing HDMFDataset so that reading the file does not immediately choke on construction failures (and that the failure on access provides useful information).
Anyways, this seems like a pretty hard problem and one that I don't have the bandwith to tackle alone (though am of course happy to help out). Maybe you guys have some ideas? This issue is pretty important for our upcoming data release - we can't really have people downloading 10gb files just to access the units table.
Here is a use pattern that I'm interested in. I don't know whether it is currently supported by pynwb or whether there is a better way to accomplish the same goals, so I'll just lay it out here.
Setting: The Allen Institute serves data over a public http API. The AllenSDK contains code for making these queries and caching the results locally. For our physiology projects, we are working on serving these data as NWB 2.0 files.
The data access and caching behavior in the AllenSDK is generally lazy - users only have to download data that they ask for.
Problem: How should I implement lazy downloading for NWB 2.0-formatted data? Here are some example use cases:
Potential solutions
@ajtritt @oruebel @bendichter Is this a terrible idea (or already implemented in a way I don't know about, or already on the roadmap)? How can I support incremental download?