Fix/analysis - Githubissues

vinicvaz commented 1 month ago

Fixing #16

Now the pipeline is passing but we need to review two main points:

[x] What is the expected data type for the variable communities_all on line 562 . Currently it is an inhomogeneous array like: [[[8, 2], [7, 11, 5, 10, 9, 4, 13, 0, 1, 14], [6, 3]]]. Is that correct? If not, we should start fixing the create_cohort_community_bag method.
[x] If the above array can be inhomogeneous we should check if the modifications on get_cohort_community_labels make sense. Also, for saving it as numpy we should use dtype=object since simple numpy arrays should be homogeneous.

katiekly commented 1 month ago

The array should be inhomogeneous, but it seems like there's an extra bracket that we don't need. What's the benefit of having it as a data type object vs a list of lists?

vinicvaz commented 1 month ago

The array should be inhomogeneous, but it seems like there's an extra bracket that we don't need. What's the benefit of having it as a data type object vs a list of lists?

The output of the method is a list of inhomogeneous lists and that is not wrong. But when saving to .npy it is automatically converted to numpy array and numpy arrays should be homogeneos so the save process break. Converting it to dtype=object make the saves working because np array of dtype object works much like a Python list . But I'm not sure if this can be done here and if the data types can be treated the way I did.

katiekly commented 1 month ago

Did you run into similar issues with get_community_labels()? With the modifications you made in get_cohort_community_labels() it's nearly the same as get_community_labels() function

As for changing the communities_all, let me look around to see if the difference in data type will cause downstream problems

vinicvaz commented 1 month ago

Did you run into similar issues with get_community_labels()? With the modifications you made in get_cohort_community_labels() it's nearly the same as get_community_labels() function

As for changing the communities_all, let me look around to see if the difference in data type will cause downstream problems

Did you run into similar issues with get_community_labels()? With the modifications you made in get_cohort_community_labels() it's nearly the same as get_community_labels() function

As for changing the communities_all, let me look around to see if the difference in data type will cause downstream problems

No, the get_community_labels worked fine. What exactly is the difference between these two methods? What kind of data each one should handle? I'm a bit confused on this because I don't understand the data well and what is expected as input/output/behavior. I think the best to do is you to test these changes I made with your data and knowledge and let me know if it makes sense. What you think?

vinicvaz commented 1 month ago

@katiekly Maybe the problem was on create_cohort_community_bag, I just sent an update to fix this method to return the array in the expected dimensions.

EthoML / VAME

Fix/analysis #19