eclipse-pass / main

Catch all repository against which issues of general, cross cutting topics are logged.
Apache License 2.0
4 stars 8 forks source link

Update grant loader to use FIBI instead of COEUS #688

Open rpoet-jh opened 1 year ago

rpoet-jh commented 1 year ago

Summary from meeting with Bob/ORIS for Coeus -> Fibi migration:

Bob stated that based on the SQL that was sent, he doesn't see any issues at this time. He is going to rewrite the SQL a bit so that is conforms with MySQL SQL and send it over to us. We should expect the new queries in a few weeks (he may get to them this week, but he is out next week).

There will be new a connection string and credentials, Bob will send those too.

Bob suggested that we take the updated SQL, and test it against their test system before Oct. 23 when Fibi PROD is going live.

Bob recommended that we should schedule our systems to update to use Fibi as soon as possible after Oct. 23 so that our data remains current.

rpoet-jh commented 1 year ago

From Bob:

Database: MySQL Server Name: sdfibi.johnshopkins.edu Schema: jhufibi Port: 3306 User Name: jhu_data_user Password: (ask Russ for password)

Test Query:

SELECT SPONSOR_NAME, SPONSOR_CODE FROM jhufibi. jhu_sponsor_view WHERE SPONSOR_CODE IN (‘300865’);

Bob is still working on rewriting our existing query, but this is a sample and connection details.

rpoet-jh commented 11 months ago

I sent Bob an email as follows:

We are starting to work on the change in PASS to transition from Coeus to Fibi. At this point, I have two questions:

dkriethof commented 10 months ago

moving to blocked until you hear back from Bob. also moving to post 1.2.0 since there is little chance this can get done before code freeze.

rpoet-jh commented 7 months ago

SQL has arrived from Bob:

Connection details to fibi training instance are in lastpass.


SELECT A.AWARD_ID
  ,A.AWARD_STATUS
  ,A.SAP_GRANT_NUMBER
  ,A.TITLE
  ,A.AWARD_DATE
  ,A.AWARD_START_DATE
  ,A.AWARD_END_DATE
  ,A.SPONSOR_NAME
  ,A.SPONSOR_CODE
  ,A.UPDATE_TIMESTAMP
  ,B.ROLE
  ,B.EMPLOYEE_ID
  ,C.FIRST_NAME
  ,C.MIDDLE_NAME
  ,C.LAST_NAME
  ,C.EMAIL_ADDRESS
  ,C.JHED_ID
  ,A.PRIME_SPONSOR_NAME
  ,A.PRIME_SPONSOR_CODE
FROM JHU_FIBI_AWD_VIEW A
    ,JHU_FIBI_AWD_INV_VIEW B
    ,JHU_PERSON_VIEW C
WHERE   A.UPDATE_TIMESTAMP > STR_TO_DATE('02/01/2024', '%m/%d/%Y')
AND        STR_TO_DATE(A.AWARD_END_DATE, '%m/%d/%Y') >= STR_TO_DATE('02/07/2024', '%m/%d/%Y')
AND        B.ROLE != 'KP'
AND        A.SAP_GRANT_NUMBER = B.SAP_GRANT_NUMBER
AND        B.EMPLOYEE_ID = C.EMPLOYEE_ID;

SELECT FIRST_NAME
  ,MIDDLE_NAME
  ,LAST_NAME
  ,EMAIL_ADDRESS
  ,USER_NAME AS JHED_ID
  ,PERSON_ID AS EMPLOYEE_ID
  ,UPDATE_TIMESTAMP
FROM  PERSON
WHERE UPDATE_TIMESTAMP > '2024-01-01';

SELECT SPONSOR_NAME
    ,SPONSOR_CODE
FROM   JHU_SPONSOR_VIEW 
WHERE  SPONSOR_CODE IN ('300865', '300866');
rpoet-jh commented 7 months ago

Started on this, sent the following email to Bob:

Hi Bob,

I was able to connect to the training instance of fibi and execute the queries successfully. Based on that exercise, I have the following questions:

I noticed many duplicated rows in the result set. Seems like in most cases, it is because there are duplicate rows for a grant in JHU_FIBI_AWD_VIEW:

-- returns 9 rows, rows are triplicated because there are triplicate rows in JHU_FIBI_AWD_VIEW for 125027 SELECT * FROM JHU_FIBI_AWD_VIEW A ,JHU_FIBI_AWD_INV_VIEW B ,JHU_PERSON_VIEW C WHERE A.SAP_GRANT_NUMBER = B.SAP_GRANT_NUMBER AND B.EMPLOYEE_ID = C.EMPLOYEE_ID AND A.sap_grant_number = '125027';

-- returns 3 duplicate rows, with distinct, returns 1 row SELECT * FROM JHU_FIBI_AWD_VIEW A WHERE A.sap_grant_number = '125027'

Are all coeus grants/funder/person data in fibi? I didn't see anything that would lead me to believe it's not, but I wanted to confirm if we need to do anything to the data we have in our PASS database to start using fibi? For example, all terminated grants from a particular date were not migrated to fibi, so we should terminates this set of grants before using fibi...

Did the direct or primary sponsor name or codes change in fibi compared to coeus. I did a random sampling, and they appear to be the same, but just wanted to confirm.

Please confirm the SPONSOR_NAME and SPONSOR_CODE are for the Direct Funder, and PRIME_SPONSOR_NAME and PRIME_SPONSOR_CODE are for the Primary Funder.

What are the possible values for JHU_FIBI_AWD_INV_VIEW.ROLE: PI = Primary Investigator, Co-I = Co-Investigator?

What are the possible values for JHU_FIBI_AWD_VIEW.AWARD_STATUS: Active, Terminated

The JHU_FIBI_AWD_VIEW.AWARD_DATE format is dd/mm/yyyy, just confirming since it is different that award start and end date format (mm/dd/yyyy)

After we clarify the questions above, I'd like to get access to the PRODUCTION instance of Fibi so I can do some more data analysis with the live data of fibi.

dkriethof commented 6 months ago

@rpoet-jh should we block this until we hear back from Bob?

rpoet-jh commented 6 months ago

Bob replied with:

  1. Records can have multiple personnel. You are getting a row for each person. You will want to limit the query to the PI if you only want one row. Add this: AND b.principal_investigator_flag = ‘Y’
  2. Yes.
  3. The codes and names are the same. SAP is the source system. That data feeds both Coeus and Fibi.
  4. Correct.
  5. There are 3 possible roles. PI (Principal Investigator); Co-PI (Co-Principal Investigator) ; Co-I (Co-Investigator)
  6. Yes.
  7. Yes. We will get this fixed. Our developers are in India – where this format is standard. I’ll let you know when we move the updated code into TR and production.

I'll investigate to determine if more follow-up is needed

rpoet-jh commented 6 months ago

Sent the following reply to Bob:

Thanks for the reply. I'm good with all the answers except question 1.

I tried adding AND b.principal_investigator_flag = ‘Y’ to the following query:

SELECT * FROM JHU_FIBI_AWD_VIEW A ,JHU_FIBI_AWD_INV_VIEW B ,JHU_PERSON_VIEW C WHERE A.SAP_GRANT_NUMBER = B.SAP_GRANT_NUMBER AND B.EMPLOYEE_ID = C.EMPLOYEE_ID AND b.principal_investigator_flag = 'Y' AND A.sap_grant_number = '125027';

but I get back 3 rows for the same PI.

The issue is in JHU_FIBI_AWD_VIEW I believe. If I run the following query, I see 3 duplicate rows returned:

SELECT * FROM JHU_FIBI_AWD_VIEW A WHERE A.sap_grant_number = '125027';

If I add distinct, only 1 row is returned:

SELECT distinct * FROM JHU_FIBI_AWD_VIEW A WHERE A.sap_grant_number = '125027';

rpoet-jh commented 6 months ago

More communication I sent to Bob:

Follow-up on the query duplicate question. Looks like the duplicate issue may only be an issue in the training fibi instance. I connected to PROD fibi, and ran the following query and one row returned:

SELECT * FROM JHU_FIBI_AWD_VIEW A WHERE A.sap_grant_number = '125027';

rpoet-jh commented 6 months ago

More communication I sent to Bob:

Sorry for the multiple emails, but I still see the duplicates in PROD fibi. I just didn't see them for that grant in that view. I now see the duplicate rows in JHU_FIBI_AWD_INV_VIEW.

I see many PIs that are duplicated in the JHU_FIBI_AWD_INV_VIEW view. See this query for an example:

SELECT * FROM JHU_FIBI_AWD_INV_VIEW A WHERE A.sap_grant_number = '114501';

rpoet-jh commented 6 months ago

Couple conclusions on fibi so far: -I'm bugging Bob about the many duplicate rows I see in the Fibi views we will use. This isn't great, and hopefully they can fix the issue to make processing more efficient. However, it isn't a blocker for us to move forward since grant loader needs to handle duplicate rows anyway in it's processing logic. -Fibi queries are relatively slow in PROD. I'm running a query to get all updated rows since 2/20/24, and it is taking 5 minutes to return the first 500 rows. Once again, this isn't the end of the world, but it seems like it should be faster. I'm not sure if I should complain about this to Bob or just live with it?

rpoet-jh commented 6 months ago

Here are a few more Fibi/Coeus data analysis points. I ran the grant pull query to get all rows updated since 2/26/2024. I see major differences in the result sets.

fibi 4913 total rows with duplicates 3441 deduped

coeus 1004 there were 13 duplicates

grant 124878 coeus has 6 co-pis fibi has no co-pis

grant row diffs fibi has 2179 grants returned that coeus does not coeus has 8 grants returned that fibi does not

I'm not sure what to make of these differences, but they seem significant. They large diffs in total rows returned is surprising. i'm not sure which is correct, could it be the semantics of updated_timestamp is different in the views between fibi and coeus. I will have to foward this info to Bob to clarification.

I will also pick a few more grants to see the specific diffs in any of the fields. So far, my analysis has been relatively high level.

rpoet-jh commented 6 months ago

Sent to Bob:

Hi Bob,

Just wanted to send over some details on the data analysis I did comparing fibi and coeus data that PASS will use/uses.

I compared the result sets between fibi and coeus based on the queries PASS executes. I ran the grant queries in coeus and fibi and pulled all rows updated since 2/26/2024.

fibi 4913 total rows returned 3441 after deduplicating rows (~1500 duplicate rows)

coeus 1004 total rows returned removed 13 duplicate rows

I compared the grant_numbers between result sets and found:

fibi has 2179 grants returned that coeus does not coeus has 8 grants returned that fibi does not

I'm not sure if the large difference in rows returned is expected? Could it be that updated_timestamp is associated to some different data between coeus and fibi?

I then did sample grant comparisons. I noticed that for all the grants selected, the co-pis were different, and the award_date was different. The update_timestamp was different as well.

grant_number: 126722 coeus has 0 co-pis fibi has 2 co-pis award_date is different update_timestamp is different

grant_number: 126758 coeus has 2 co-pis fibi has 0 co-pis award_date is different award_end is different update_timestamp is different

grant_number: 126758 coeus has 4 co-pis fibi has 0 co-pis award_date is different update_timestamp is different

grant_number: 130925 coeus has 16 co-pis fibi has 4 co-pis award_date is different update_timestamp is different