RMI-PACTA / workflow.data.preparation

The goal of `workflow.data.preparation` is to prepare all of the necessary data inputs for the Transition Monitor web application.
Other
2 stars 0 forks source link

deal with `left_join()` many-to-many relationship warnings #101

Closed cjyetman closed 9 months ago

cjyetman commented 9 months ago

A few left_join() commands in the abcd_flags section of run_pacta_data_preparation.R throw many-to-many relationship warnings.

In each case, we should verify if the many-to-many relationship is truly expected (conceptually). If so, these warnings can be silenced by explicitly specifying relationship = "many-to-many". If not, we need to look into why that is happening and fix it.

I have identified the following lines that exhibit this behavior:

https://github.com/RMI-PACTA/workflow.data.preparation/blob/c08b8fb5a139716997ae2c0e83a01757d51a9a2d/run_pacta_data_preparation.R#L359

https://github.com/RMI-PACTA/workflow.data.preparation/blob/c08b8fb5a139716997ae2c0e83a01757d51a9a2d/run_pacta_data_preparation.R#L360

https://github.com/RMI-PACTA/workflow.data.preparation/blob/c08b8fb5a139716997ae2c0e83a01757d51a9a2d/run_pacta_data_preparation.R#L384

https://github.com/RMI-PACTA/workflow.data.preparation/blob/c08b8fb5a139716997ae2c0e83a01757d51a9a2d/run_pacta_data_preparation.R#L385

https://github.com/RMI-PACTA/workflow.data.preparation/blob/c08b8fb5a139716997ae2c0e83a01757d51a9a2d/run_pacta_data_preparation.R#L389-L392

cjyetman commented 9 months ago

The current Company_ID_List file from AI that I've been testing with, "2023-02-15_AI_RMI_Bespoke_Company_Data_Products_Company_ID_List_2022Q4.csv", appears to have one duplicate row, and these warnings appear to be a symptom of that. I'll open a PR that enforces distinct rows when that file is read in.