Closed kdharris101 closed 7 years ago
I'm not exactly sure how to model this.
class TimeSeries(BaseModel):
file = models.ForeignKey(Dataset, help_text="txn array where t is number of timepoints "
"and n is number of traces")
column_names = models.ListField() # ???
description = models.TextField(null=True, blank=True)
timestamps = models.ManyToManyField(Timestamp)
experiment = models.ForeignKey(BaseAction)
Questions:
column_names
? Options: use a single comma-separated string, or a many-to-many relationship to a new ColumnName
model...experiment
ForeignKey? BaseAction or something else?@nippoo @nsteinme
Looks like there is something called an ArrayField, will that work? https://docs.djangoproject.com/en/1.10/ref/contrib/postgres/fields/ Otherwise yes, I think a comma-separated string is the way to go for simplicity.
The ForeignKey for experiment is to actions.models.Experiment
This is the link to the dataset and some metadata! I don't know whether I understand the last question.
Right, ArrayField seems like a good solution. I guess my last question was: generally speaking, how do you intend to use this? Why not just use a Dataset for example? What are your use cases exactly?
So, this would be used to store analog traces recorded by some device. For instance, we currently record multiple signals with a national instruments data acquisition card, including: position of the wheel (i.e. behavioral manipulandum), photodiode reading (to synchronize visual stimulus presentation), strobes from cameras (to synchronize), etc.
Or any other set of analog traces that share a common timebase, for instance including electrophysiological data.
So it's a very general (and important) type of dataset.
Currently dataset doesn't have the fields for column names or timestamps, which of course we need to record. How would you recommend we record these?
OK I understand. I imagined you could save the column names directly in the dataset, on disk, rather than in the database, but that's perfectly fine.
Currently these datasets are in the form of matlab structs, which have one field for the data matrix, one field for the timestamps, and a cell array of channel names. But we want to do npy files for the new system. So perhaps we should pick a new standard filename for that kind of metadata? Like: myData.npy myData.timestamps.npy myData.columnNames.npy % string with comma-separated column names or myData.columnNames.json @kdharris101 ?
This actually raises an important general question. To what extent do we want metadata to be in the npy files, and to what extent in the database. There is no harm in duplicating, but in this case we need to decide which one is “master”. The way I did this before – when I used an sql system as a postdoc, and before that working as a db programmer in a phone company – is that files are the master, and the database is a tool that allows you to search files easily, and you are prepared to wipe and recreate the database from files all the time. However I realize this is not how things usually work in industry, the database is usually the master.
One other point: comma-delimited text files are a disaster. Because all it takes is someone to type in a comma into a text field and everything is screwed. Tab-delimited is less likely to have this problem. But json is surely best.
From: nsteinme [mailto:notifications@github.com] Sent: 24 February 2017 17:27 To: cortex-lab/alyx alyx@noreply.github.com Cc: Harris, Kenneth kenneth.harris@ucl.ac.uk; Mention mention@noreply.github.com Subject: Re: [cortex-lab/alyx] new model: TimeSeries (#98)
Currently these datasets are in the form of matlab structs, which have one field for the data matrix, one field for the timestamps, and a cell array of channel names. But we want to do npy files for the new system. So perhaps we should pick a new standard filename for that kind of metadata? Like: myData.npy myData.timestamps.npy myData.columnNames.npy % string with comma-separated column names or myData.columnNames.json @kdharris101https://github.com/kdharris101 ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cortex-lab/alyx/issues/98#issuecomment-282351136, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD7KlVMS4L-u_Fw3GfA_0a6RzoQX-A9Tks5rfxLlgaJpZM4MJ_3q.
Yes, you're right, let's go with json on the column names.
On Sun, Feb 26, 2017 at 11:33 AM, Kenneth Harris notifications@github.com wrote:
This actually raises an important general question. To what extent do we want metadata to be in the npy files, and to what extent in the database. There is no harm in duplicating, but in this case we need to decide which one is “master”. The way I did this before – when I used an sql system as a postdoc, and before that working as a db programmer in a phone company – is that files are the master, and the database is a tool that allows you to search files easily, and you are prepared to wipe and recreate the database from files all the time. However I realize this is not how things usually work in industry, the database is usually the master.
One other point: comma-delimited text files are a disaster. Because all it takes is someone to type in a comma into a text field and everything is screwed. Tab-delimited is less likely to have this problem. But json is surely best.
From: nsteinme [mailto:notifications@github.com] Sent: 24 February 2017 17:27 To: cortex-lab/alyx alyx@noreply.github.com Cc: Harris, Kenneth kenneth.harris@ucl.ac.uk; Mention < mention@noreply.github.com> Subject: Re: [cortex-lab/alyx] new model: TimeSeries (#98)
Currently these datasets are in the form of matlab structs, which have one field for the data matrix, one field for the timestamps, and a cell array of channel names. But we want to do npy files for the new system. So perhaps we should pick a new standard filename for that kind of metadata? Like: myData.npy myData.timestamps.npy myData.columnNames.npy % string with comma-separated column names or myData.columnNames.json @kdharris101https://github.com/kdharris101 ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ cortex-lab/alyx/issues/98#issuecomment-282351136, or mute the thread< https://github.com/notifications/unsubscribe-auth/AD7KlVMS4L-u_Fw3GfA_ 0a6RzoQX-A9Tks5rfxLlgaJpZM4MJ_3q>.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/alyx/issues/98#issuecomment-282549652, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP0B37yuEn75w8K5NAI66rFesPIZRks5rgWLwgaJpZM4MJ_3q .
Yes, you're right, let's go with json on the column names.
Or an ArrayField? I see that ArrayFields are already used in the code base.
I guess we want to have the column names also on disk, as per Kenneth's suggestion that this sort of thing be re-build-able, which seems sensible to me. So json for disk, array field for database, if that seems all right to you.
On Sun, Feb 26, 2017 at 12:59 PM, Cyrille Rossant notifications@github.com wrote:
Yes, you're right, let's go with json on the column names.
Or an ArrayField? I see that ArrayFields are already used in the code base.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/alyx/issues/98#issuecomment-282554151, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP-ezM0M3hQt4QrJEH5eYXKXKL-2pks5rgXcWgaJpZM4MJ_3q .
Within the database, no reason not to have an ArrayField.
If we are talking about an external file I would go with either tab-delimited or json. We should try to be consistent in how we represent text/mixed-type data in external files. The advantage of tab-delimited is human readability. The advantage of json is flexibility. Thinking more about it I tend towards tab-delimited.
The more general question is what are we planning to represent in external files, that isn’t purely numerical?
From: Cyrille Rossant [mailto:notifications@github.com] Sent: 26 February 2017 12:59 To: cortex-lab/alyx alyx@noreply.github.com Cc: Harris, Kenneth kenneth.harris@ucl.ac.uk; Mention mention@noreply.github.com Subject: Re: [cortex-lab/alyx] new model: TimeSeries (#98)
Yes, you're right, let's go with json on the column names.
Or an ArrayField? I see that ArrayFields are already used in the code base.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cortex-lab/alyx/issues/98#issuecomment-282554151, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD7KlUxJKeOD8NLV2Gff_IBNe1Vf7yArks5rgXcWgaJpZM4MJ_3q.
What's the drawback of just saving the column names on disk and not in the database? Why do we need them in the database, do we ever need to do queries on these column names?
You might want to search for datasets with a column called "piezoLickDetector" for instance, if you want to find datasets that have that kind of data to analyze. That's probably a rare use case, but it just feels insufficient to me to have the database say "here's a 15 x 10000 matrix. No idea what it contains." Why not be able to label?
If it helps this discussion at all, here are examples of the contents of the four common kinds of structs that are saved by experiments in the lab. "block" is for behavioral experiments with Choiceworld and for signals. "timeline" is signals recorded by a ni-daq. "parameters" can be any kind of experiment's parameters. "Protocol" is an mpep thing. I think we're doing well though - there are some things that will be fine in EventSeries, some that are fine in TimeSeries, and the rest either match existing exp-metadata fields (like start_time) or will be json, so we should be set!
block =
expType: 'ChoiceWorld'
trial: [1x229 struct]
stimWindowUpdateTimes: [5656x1 double]
stimWindowUpdateLags: [5656x1 double]
startDateTime: 7.3668e+05
startDateTimeStr: '18-Dec-2016 18:56:57'
parameters: [1x1 struct]
endStatus: 'quit'
rewardDeliveredSizes: [155x1 double]
rewardDeliveryTimes: [1x155 double]
rigName: 'zgood'
expRef: '2016-12-18_2_Cori'
experimentInitTime: 0.2619
experimentStartedTime: 0.2702
experimentEndedTime: 1.1945e+03
experimentCleanupTime: 1.1945e+03
endDateTime: 7.3668e+05
endDateTimeStr: '18-Dec-2016 19:16:51'
numCompletedTrials: 228
duration: 1.1943e+03
inputSensorPositions: [372390x1 double]
inputSensorPositionTimes: [372390x1 double]
inputSensorGain: 10.0025
lickCounts: []
lickCountTimes: []
>> block.trial(1)
ans =
condition: [1x1 struct]
trialStartedTime: 0.2722
intermissionStartedTime: 0.2732
quiescenceWatchStartedTime: 0.2807
visCuePhase: [4.8067 4.7062]
quiescenceWatchEndedTime: 9.6450
quiescentEpochTime: 9.6460
intermissionEndedTime: 9.6470
onsetToneSoundPlayedTime: [9.6566 10.3816]
stimulusBackgroundStartedTime: 9.6606
stimulusCueStartedTime: 9.6616
interactiveStartedTime: 10.3716
interactiveZeroInputPos: -49108
interactiveMovementTime: [10.3855 10.3909 10.4041 10.4206 10.4372 10.4535 10.4703]
inputThresholdCrossedTime: 10.4767
interactiveEndedTime: 10.4835
responseMadeTime: 10.4871
feedbackStartedTime: 10.4881
feedbackType: 1
feedbackPositiveStartedTime: 10.4892
responseMadeID: 1
inputThresholdCrossedID: 1
feedbackPositiveEndedTime: 11.4735
feedbackEndedTime: 11.4746
stimulusCueEndedTime: 11.4755
stimulusBackgroundEndedTime: 11.4765
trialEndedTime: 11.4774
feedbackNegativeStartedTime: []
negFeedbackSoundPlayedTime: []
feedbackNegativeEndedTime: []
>> parameters
parameters =
experimentFun: @UNKNOWN Function
experimentFunDescription: 'Function to create the experiment, takes 2 arguments: the pa...'
type: 'ChoiceWorld'
rewardVolume: 2.8000
rewardVolumeUnits: 'µl'
rewardVolumeDescription: 'Reward volumn delivered on each correct trial'
onsetVisStimDelay: 0
onsetVisStimDelayUnits: 's'
onsetVisStimDelayDescription: 'Duration between the start of the onset tone and visual stim...'
onsetToneDuration: 0.1000
onsetToneDurationUnits: 's'
onsetToneDurationDescription: 'Duration of the onset tone'
onsetToneRampDuration: 0.0100
onsetToneRampDurationUnits: 's'
onsetToneRampDurationDescription: 'Duration of the onset tone amplitude ramp (up and down each ...'
preStimQuiescentPeriod: [2x1 double]
preStimQuiescentPeriodUnits: 's'
preStimQuiescentPeriodDescription: 'Required period of no input before stimulus presentation'
bgCueDelay: 0
bgCueDelayUnits: 's'
bgCueDelayDescription: 'Delay period between target column presentation and grating cue'
cueInteractiveDelay: [2x1 double]
cueInteractiveDelayUnits: 's'
cueInteractiveDelayDescription: 'Delay period between grating cue presentation and interactiv...'
responseWindow: 1.5000
responseWindowUnits: 's'
responseWindowDescription: 'Duration of window allowed for making a response'
positiveFeedbackPeriod: 1
positiveFeedbackPeriodUnits: 's'
positiveFeedbackPeriodDescription: 'Duration of positive feedback phase (with stimulus locked in...'
negativeFeedbackPeriod: 1
negativeFeedbackPeriodUnits: 's'
negativeFeedbackPeriodDescription: 'Duration of negative feedback phase (with stimulus locked in...'
[etc]
>> Timeline
Timeline =
expRef: '2016-12-18_1_Cori'
savePaths: {2x1 cell}
isRunning: 0
hw: [1x1 struct]
rawDAQData: [9392500x19 double]
rawDAQSampleCount: 9392500
datFID: 3
startDateTime: 7.3668e+05
startDateTimeStr: '18-Dec-2016 18:56:11'
nextChronoSign: -1
lastTimestamp: 3.7570e+03
lastClockSentSysTime: 5.3788e+06
currSysTimeTimelineOffset: 5.3750e+06
figHandle: []
rawDAQTimestamps: [1x9392500 double]
>> Timeline.hw
ans =
daqVendor: 'ni'
daqDevice: 'Dev1'
daqSampleRate: 2500
daqSamplesPerNotify: []
chronoOutDaqChannelID: 'port0/line1'
acqLiveDaqChannelID: 'port0/line8'
useClockOutput: 1
clockOutputChannelID: 'ctr1'
clockOutputFrequency: 70
clockOutputDutyCycle: 0.1000
clockOutputInitialDelay: 0.5000
camSyncPulse: 1
camSyncPulsePauseDuration: 0.2000
camSyncDaqChannelID: 'port0/line3'
stopDelay: 2
makePlots: 1
figPosition: [50 50 1700 900]
figScales: [1 0.5000 3 1 1 1 10 1 1 10 1 8 1 1 1 1 1 1 1]
recordAudio: 0
audioRecDevice: 1
audioRecFs: 192000
writeDat: 1
dataType: 'double'
samplingInterval: 4.0000e-04
inputs: [1x19 struct]
arrayChronoColumn: 1
>> columnLabels = {Timeline.hw.inputs.name}
columnLabels =
Columns 1 through 6
'chrono' 'photoDiode' 'rotaryEncoder' 'eyeCameraStrobe' 'waveOutput' 'openChan1'
Columns 7 through 12
'piezoLickDetector' 'openChan2' 'camSync' 'whiskCamStrobe' 'rewardEcho' 'audioMonitor'
Columns 13 through 17
'faceCamStrobe' 'blueLEDmonitor' 'purpleLEDmonitor' 'pcoExposure' 'acqLive'
Columns 18 through 19
'tlExposeClock' 'stimScreen'
>> Protocol
Protocol =
xfile: 'stimGratingAndLaserCommands.x'
adapt: [1x1 struct]
nstim: []
npfilestimuli: 28
npars: 28
pars: [28x28 double]
parnames: {28x1 cell}
pardefs: {28x1 cell}
animal: 'Noam'
iseries: '2016-12-11'
iexp: 5
nrepeats: 20
seqnums: [28x20 double]
should be done. see https://github.com/cortex-lab/alyx/commit/166b7ceec564995a367ca67bbc38e04c355695cf#diff-488537eccebb33b949b0a1235628c053R156 if you want to double check
When I was writing SQL queries I realized that there appears to be a convention, which is that fields with uuid's as values have "_id" at the end of the field name - did you notice that, is it true? If so let's probably go with file_id, experiment_id, etc?
On Mon, Feb 27, 2017 at 3:05 PM, Cyrille Rossant notifications@github.com wrote:
should be done. see 166b7ce#diff-488537eccebb33b949b0a1235628c053R156 https://github.com/cortex-lab/alyx/commit/166b7ceec564995a367ca67bbc38e04c355695cf#diff-488537eccebb33b949b0a1235628c053R156 if you want to double check
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/alyx/issues/98#issuecomment-282745444, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP7LBClleyYpawzCQwFmufsnAP-12ks5rguYigaJpZM4MJ_3q .
Yes, I believe that was the convention.
I think django automatically postpends _id
for the SQL columns, but this suffix should not appear in the Python models. See https://docs.djangoproject.com/en/1.10/ref/models/fields/#database-representation
Got it, in that case looks good as far as I can see.
On Mon, Feb 27, 2017 at 6:21 PM, Cyrille Rossant notifications@github.com wrote:
I think django automatically postpends _id for the SQL columns, but this suffix should not appear in the Python models.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/alyx/issues/98#issuecomment-282804588, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP8r3JoEsFwz_an_Golr6JyXnfJOuks5rgxRGgaJpZM4MJ_3q .
A timeseries document links to a file containing a time series or multiple timeseries file: dataset # txn array where t is number of timepoints and n is number of traces column_names: array field of base field 1024char. Length is equal to number of traces. description: text timestamps: array of datasets (of the new type, Timestamp). Can be more than one. experiment_id: link to actions table