NeurodataWithoutBorders / nwb-schema

Data format specification schema for the NWB neurophysiology data format
http://nwb-schema.readthedocs.io
Other
52 stars 16 forks source link

make session_start_time optional #168

Closed bendichter closed 4 years ago

bendichter commented 6 years ago

I have a use-case where I'd like to make session_start_time optional. I am storing ECoG data that has cortical surface information for the subject that does not change between sessions. I have decided to store this information in an external NWB file and link to the information so that I do not need to store this cortical surface data in every session file. So I have created an NWB file that just holds the cortical surface data. The problem is that to create this file I must have a session_start_time, which is really not relevant in this case. I tried entering null-like values like '', ' ' and 'NA', but there is a date checker in pynwb that does not allow any of these as valid entries. The only way to write the file is to make up a date. I chose datetime(1900, 1, 1), because it is unlikely to get confused with a real start time, but I consider this solution far from ideal. My proposal is to solve this problem either by making session_start_time optional or by specifying a standard null value and relaxing the date checker in pynwb to accept this null value. That would essentially accomplish the same goal, though a bit more indirectly.

tjd2002 commented 6 years ago

Seems reasonable—as I recall this came up with folks storing results of simulations, as well.

On Jun 8, 2018, at 4:15 AM, Ben Dichter notifications@github.com wrote:

I have a use-case where I'd like to make session_start_time optional. I am storing ECoG data that has cortical surface information for the subject that does not change between sessions. I have decided to store this information in an external NWB file and link to the information so that I do not need to store this cortical surface data in every session file. So I have created an NWB file that just holds the cortical surface data. The problem is that to create this file I must have a session_start_time, which is really not relevant in this case. I tried entering null-like values like '', ' ' and 'NA', but there is a date checker in pynwb that does not allow any of these as valid entries. The only way to write the file is to make up a date. I chose datetime(1900, 1, 1), because it is unlikely to get confused with a real start time, but I consider this solution far from ideal. My proposal is to solve this problem either by making session_start_time optional or by specifying a standard null value and relaxing the date checker in pynwb to accept this null value. That would essentially accomplish the same goal, though a bit more indirectly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NeurodataWithoutBorders/nwb-schema/issues/168, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK3c9Z7-rAB2Qcnboc2JQmjBTqFMhmwks5t6ly9gaJpZM4Uf8Gk.

t-b commented 6 years ago

specifying a standard null value and relaxing the date checker in pynwb to accept this null value.

I would prefer that. Otherwise the average john doe user will just forget to add it.

neuromusic commented 6 years ago

I fully endorse this. I could have sworn that this was proposed once before.

However, I strongly oppose default values if the user leaves it null. This forces downstream processes to need to validate whether or not they can "trust" the datetime.

Take the Taylor Farm as an example...

As any geography nerd knows, the precise center of the United States is in northern Kansas, near the Nebraska border. Technically, the latitudinal and longitudinal coordinates of the center spot are 39°50′N 98°35′W. In digital maps, that number is an ugly one: 39.8333333,-98.585522. So back in 2002, when MaxMind was first choosing the default point on its digital map for the center of the U.S., it decided to clean up the measurements and go with a simpler, nearby latitude and longitude: 38°N 97°W or 38.0000,-97.0000.

As a result, for the last 14 years, every time MaxMind's database has been queried about the location of an IP address in the United States it can't identify, it has spit out the default location of a spot two hours away from the geographic center of the country. This happens a lot: 5,000 companies rely on MaxMind's IP mapping information, and in all, there are now over 600 million IP addresses associated with that default coordinate. If any of those IP addresses are used by a scammer, or a computer thief, or a suicidal person contacting a help line, MaxMind's database places them at the same spot: 38.0000,-97.0000.

Which happens to be in the front yard of Joyce Taylor's house.

https://splinternews.com/how-an-internet-mapping-glitch-turned-a-random-kansas-f-1793856052

Inaccurate data is much more dangerous for provenance tracking if the average john doe user forgets to add it than being explicit that the field was not populated.

bendichter commented 6 years ago

OK how about this as a compromise: session_start_time is optional in the schema, but required in pynwb.NWBFile. An argument for session_start_time must be supplied in NWBFile, but None is allowed, in which case no session_start_time is written to the file.

oruebel commented 6 years ago

For files that contain only non-timeseries data I think allowing None for session_start_time seems technically fine. However, as soon as timeseries data is stored in a file we should always require a valid session_start_time. I just really want to make sure that we are not starting to break the requirement that timeseries are aligned in a file and that we can relate data across time.

As a refinement to this discussion, I would therefore propose that:

I understand the reasoning for this change, but this is a fairly fundamental change and all I'm arguing for is that we need to be careful that we cover all the bases.

oruebel commented 6 years ago

@ajtritt any thoughts on this?

bendichter commented 6 years ago

Yeah all of that sounds good to me. It makes sense to be extra careful with a change like this