commfish / coho_known_age_study

1 stars 1 forks source link

Duplicate Sample_IDs #5

Closed justinpriest closed 5 years ago

justinpriest commented 5 years ago

There are two Sample_IDs that are repeated twice: AL98_0519_0004 and AL98_0521_0029. When aggregating data, this causes a conflict (and thus the rows of data are different between the old SEM code and the new JTP code). Right now, spread is arbitrarily choosing one row. The circulus distances appear to be precisely the same for both. For 0004, the IMAGENAME, Length, and Age columns differ slightly but everything else is the same. For 0029, the IMAGENAME, Date, and Length columns differ slightly but everything else is the same. Looking at the CSV file, it looks like these were re-aged or something? My code drops the second instances of these (where there were no blanks in Distances). Is this fine to keep these dropped?

fssem1 commented 5 years ago

If you look at the original data, there are blank rows for some data (including the two repeats you mentioned above). It is best to have a 'data clean' section at the top of your code where you filter by Data_Pairs does not = NA. This way the original data is not touched, and your issue should be resolved.

justinpriest commented 5 years ago

OK I thought this might have been the case but didn't want to exclude rows without checking first. Fix is implemented now.