Open luiztauffer opened 5 years ago
@ajtritt do you think you could help with this one?
At this point, I think the only solution to getting the behavior that you are expecting would be for us to inspect the data you pass in under the hood, and then make an assumption that the ragged nature of the passed in data means you want to index it. Not that I'm necessarily opposed to that, just mentioning what I think it would take. If we want to implement that feature, we should probably solicit feedback or hash this over on a Feature Request issue.
Fortunately for you @luiztauffer, there is a relatively painless work around. Try this:
from pynwb import NWBFile, NWBHDF5IO
from datetime import datetime
# Make data source file
nwb = NWBFile('aa','aa', datetime.now().astimezone())
with NWBHDF5IO('test.nwb', 'w') as io:
nwb.add_unit()
nwb.add_unit()
nwb.add_unit()
################################
# concatenate in a single list
new_data = ['a','a','b','a','b','c']
# specify the index for the concatenated data
nwb.add_unit_column(name='new_col', description='', data=new_data, index=[1, 3, 6])
################################
print(nwb.units['new_col'][2])
print(type(nwb.units['new_col'][2]))
io.write(nwb)
io = NWBHDF5IO('test.nwb', 'r')
nwb = io.read()
print(nwb.units['new_col'][2])
print(type(nwb.units['new_col'][2]))
io.close()
Should yield this:
['a', 'b', 'c']
<class 'list'>
['a' 'b' 'c']
<class 'numpy.ndarray'>
Let me know if that works
thanks @ajtritt it works with that work around!
@ajtriit thanks for the workaround! That's certainly lifts the immediate block.
I think ultimately though it is a bug for a list of strings to be automatically turned into a string of that list. Inspecting to see if an array is ragged and treating it accordingly seems like a good idea to me.
I think ultimately though it is a bug for a list of strings to be automatically turned into a string of that list.
@bendichter Yeah, this lies somewhere in the gray area between bug and undocumented gotcha. I think the original code only works because of how h5py handles strings. If you change the data to integers, you get a broadcast error.
Inspecting to see if an array is ragged and treating it accordingly seems like a good idea to me.
Full inspection is a costly operation, and I would like to avoid doing that every time only to support this edge case. If a user wants to pass in data like this, they will at least need to specify that it is irregularly shaped. Under the hood, the data can be unraveled and an index can be created. Something like this:
new_data = [['a'],['a','b'],['a','b','c']]
nwb.add_unit_column(name='new_col', description='', data=new_data, index=True)
I think that ragged data will not be uncommon, the case in point is unit spike features (attributing a value for each spike from every unit), units will most of the times have different spike counts. I think that indexing as you suggested is quite straightforward for the user to deal with, but perhaps those particular cases should be better described at add_unit_column doc
Full inspection is a costly operation, and I would like to avoid doing that every time only to support this edge case.
I agree. I think requiring that a user set index=True
when adding a ragged column is fine.
perhaps those particular cases should be better described at add_unit_column doc
Agreed
1) Bug
Added unit_columns items are saved as strings, when they're actually lists. In the code below, notice the printing values: the items should be lists, and are added correctly to the nwb object. But once saved, the become strings.
Steps to Reproduce
This prints:
Checklist