Closed vinicvaz closed 1 month ago
The array should be inhomogeneous, but it seems like there's an extra bracket that we don't need. What's the benefit of having it as a data type object vs a list of lists?
The array should be inhomogeneous, but it seems like there's an extra bracket that we don't need. What's the benefit of having it as a data type object vs a list of lists?
The output of the method is a list of inhomogeneous lists and that is not wrong. But when saving to .npy
it is automatically converted to numpy array and numpy arrays should be homogeneos so the save process break. Converting it to dtype=object make the saves working because np array of dtype object works much like a Python list .
But I'm not sure if this can be done here and if the data types can be treated the way I did.
Did you run into similar issues with get_community_labels()
? With the modifications you made in get_cohort_community_labels()
it's nearly the same as get_community_labels()
function
As for changing the communities_all
, let me look around to see if the difference in data type will cause downstream problems
Did you run into similar issues with
get_community_labels()
? With the modifications you made inget_cohort_community_labels()
it's nearly the same asget_community_labels()
functionAs for changing the
communities_all
, let me look around to see if the difference in data type will cause downstream problemsDid you run into similar issues with
get_community_labels()
? With the modifications you made inget_cohort_community_labels()
it's nearly the same asget_community_labels()
functionAs for changing the
communities_all
, let me look around to see if the difference in data type will cause downstream problems
No, the get_community_labels
worked fine. What exactly is the difference between these two methods?
What kind of data each one should handle? I'm a bit confused on this because I don't understand the data well and what is expected as input/output/behavior. I think the best to do is you to test these changes I made with your data and knowledge and let me know if it makes sense. What you think?
@katiekly Maybe the problem was on create_cohort_community_bag
, I just sent an update to fix this method to return the array in the expected dimensions.
Fixing #16
Now the pipeline is passing but we need to review two main points:
communities_all
on line 562 . Currently it is an inhomogeneous array like:[[[8, 2], [7, 11, 5, 10, 9, 4, 13, 0, 1, 14], [6, 3]]]
. Is that correct? If not, we should start fixing thecreate_cohort_community_bag
method.get_cohort_community_labels
make sense. Also, for saving it as numpy we should usedtype=object
since simple numpy arrays should be homogeneous.