Open rpoet-jh opened 1 year ago
From Bob:
Database: MySQL
Server Name: sdfibi.johnshopkins.edu
Schema: jhufibi
Port: 3306
User Name: jhu_data_user
Password:
Test Query:
SELECT SPONSOR_NAME, SPONSOR_CODE FROM jhufibi. jhu_sponsor_view WHERE SPONSOR_CODE IN (‘300865’);
Bob is still working on rewriting our existing query, but this is a sample and connection details.
I sent Bob an email as follows:
We are starting to work on the change in PASS to transition from Coeus to Fibi. At this point, I have two questions:
moving to blocked until you hear back from Bob. also moving to post 1.2.0 since there is little chance this can get done before code freeze.
SQL has arrived from Bob:
Connection details to fibi training instance are in lastpass.
SELECT A.AWARD_ID
,A.AWARD_STATUS
,A.SAP_GRANT_NUMBER
,A.TITLE
,A.AWARD_DATE
,A.AWARD_START_DATE
,A.AWARD_END_DATE
,A.SPONSOR_NAME
,A.SPONSOR_CODE
,A.UPDATE_TIMESTAMP
,B.ROLE
,B.EMPLOYEE_ID
,C.FIRST_NAME
,C.MIDDLE_NAME
,C.LAST_NAME
,C.EMAIL_ADDRESS
,C.JHED_ID
,A.PRIME_SPONSOR_NAME
,A.PRIME_SPONSOR_CODE
FROM JHU_FIBI_AWD_VIEW A
,JHU_FIBI_AWD_INV_VIEW B
,JHU_PERSON_VIEW C
WHERE A.UPDATE_TIMESTAMP > STR_TO_DATE('02/01/2024', '%m/%d/%Y')
AND STR_TO_DATE(A.AWARD_END_DATE, '%m/%d/%Y') >= STR_TO_DATE('02/07/2024', '%m/%d/%Y')
AND B.ROLE != 'KP'
AND A.SAP_GRANT_NUMBER = B.SAP_GRANT_NUMBER
AND B.EMPLOYEE_ID = C.EMPLOYEE_ID;
SELECT FIRST_NAME
,MIDDLE_NAME
,LAST_NAME
,EMAIL_ADDRESS
,USER_NAME AS JHED_ID
,PERSON_ID AS EMPLOYEE_ID
,UPDATE_TIMESTAMP
FROM PERSON
WHERE UPDATE_TIMESTAMP > '2024-01-01';
SELECT SPONSOR_NAME
,SPONSOR_CODE
FROM JHU_SPONSOR_VIEW
WHERE SPONSOR_CODE IN ('300865', '300866');
Started on this, sent the following email to Bob:
Hi Bob,
I was able to connect to the training instance of fibi and execute the queries successfully. Based on that exercise, I have the following questions:
I noticed many duplicated rows in the result set. Seems like in most cases, it is because there are duplicate rows for a grant in JHU_FIBI_AWD_VIEW:
-- returns 9 rows, rows are triplicated because there are triplicate rows in JHU_FIBI_AWD_VIEW for 125027 SELECT * FROM JHU_FIBI_AWD_VIEW A ,JHU_FIBI_AWD_INV_VIEW B ,JHU_PERSON_VIEW C WHERE A.SAP_GRANT_NUMBER = B.SAP_GRANT_NUMBER AND B.EMPLOYEE_ID = C.EMPLOYEE_ID AND A.sap_grant_number = '125027';
-- returns 3 duplicate rows, with distinct, returns 1 row SELECT * FROM JHU_FIBI_AWD_VIEW A WHERE A.sap_grant_number = '125027'
Are all coeus grants/funder/person data in fibi? I didn't see anything that would lead me to believe it's not, but I wanted to confirm if we need to do anything to the data we have in our PASS database to start using fibi? For example, all terminated grants from a particular date were not migrated to fibi, so we should terminates this set of grants before using fibi...
Did the direct or primary sponsor name or codes change in fibi compared to coeus. I did a random sampling, and they appear to be the same, but just wanted to confirm.
Please confirm the SPONSOR_NAME and SPONSOR_CODE are for the Direct Funder, and PRIME_SPONSOR_NAME and PRIME_SPONSOR_CODE are for the Primary Funder.
What are the possible values for JHU_FIBI_AWD_INV_VIEW.ROLE: PI = Primary Investigator, Co-I = Co-Investigator?
What are the possible values for JHU_FIBI_AWD_VIEW.AWARD_STATUS: Active, Terminated
The JHU_FIBI_AWD_VIEW.AWARD_DATE format is dd/mm/yyyy, just confirming since it is different that award start and end date format (mm/dd/yyyy)
After we clarify the questions above, I'd like to get access to the PRODUCTION instance of Fibi so I can do some more data analysis with the live data of fibi.
@rpoet-jh should we block this until we hear back from Bob?
Bob replied with:
I'll investigate to determine if more follow-up is needed
Sent the following reply to Bob:
Thanks for the reply. I'm good with all the answers except question 1.
I tried adding AND b.principal_investigator_flag = ‘Y’
to the following query:
SELECT * FROM JHU_FIBI_AWD_VIEW A ,JHU_FIBI_AWD_INV_VIEW B ,JHU_PERSON_VIEW C WHERE A.SAP_GRANT_NUMBER = B.SAP_GRANT_NUMBER AND B.EMPLOYEE_ID = C.EMPLOYEE_ID AND b.principal_investigator_flag = 'Y' AND A.sap_grant_number = '125027';
but I get back 3 rows for the same PI.
The issue is in JHU_FIBI_AWD_VIEW I believe. If I run the following query, I see 3 duplicate rows returned:
SELECT * FROM JHU_FIBI_AWD_VIEW A WHERE A.sap_grant_number = '125027';
If I add distinct, only 1 row is returned:
SELECT distinct * FROM JHU_FIBI_AWD_VIEW A WHERE A.sap_grant_number = '125027';
More communication I sent to Bob:
Follow-up on the query duplicate question. Looks like the duplicate issue may only be an issue in the training fibi instance. I connected to PROD fibi, and ran the following query and one row returned:
SELECT * FROM JHU_FIBI_AWD_VIEW A WHERE A.sap_grant_number = '125027';
More communication I sent to Bob:
Sorry for the multiple emails, but I still see the duplicates in PROD fibi. I just didn't see them for that grant in that view. I now see the duplicate rows in JHU_FIBI_AWD_INV_VIEW.
I see many PIs that are duplicated in the JHU_FIBI_AWD_INV_VIEW view. See this query for an example:
SELECT * FROM JHU_FIBI_AWD_INV_VIEW A WHERE A.sap_grant_number = '114501';
Couple conclusions on fibi so far: -I'm bugging Bob about the many duplicate rows I see in the Fibi views we will use. This isn't great, and hopefully they can fix the issue to make processing more efficient. However, it isn't a blocker for us to move forward since grant loader needs to handle duplicate rows anyway in it's processing logic. -Fibi queries are relatively slow in PROD. I'm running a query to get all updated rows since 2/20/24, and it is taking 5 minutes to return the first 500 rows. Once again, this isn't the end of the world, but it seems like it should be faster. I'm not sure if I should complain about this to Bob or just live with it?
Here are a few more Fibi/Coeus data analysis points. I ran the grant pull query to get all rows updated since 2/26/2024. I see major differences in the result sets.
fibi 4913 total rows with duplicates 3441 deduped
coeus 1004 there were 13 duplicates
grant 124878 coeus has 6 co-pis fibi has no co-pis
grant row diffs fibi has 2179 grants returned that coeus does not coeus has 8 grants returned that fibi does not
I'm not sure what to make of these differences, but they seem significant. They large diffs in total rows returned is surprising. i'm not sure which is correct, could it be the semantics of updated_timestamp is different in the views between fibi and coeus. I will have to foward this info to Bob to clarification.
I will also pick a few more grants to see the specific diffs in any of the fields. So far, my analysis has been relatively high level.
Sent to Bob:
Hi Bob,
Just wanted to send over some details on the data analysis I did comparing fibi and coeus data that PASS will use/uses.
I compared the result sets between fibi and coeus based on the queries PASS executes. I ran the grant queries in coeus and fibi and pulled all rows updated since 2/26/2024.
fibi 4913 total rows returned 3441 after deduplicating rows (~1500 duplicate rows)
coeus 1004 total rows returned removed 13 duplicate rows
I compared the grant_numbers between result sets and found:
fibi has 2179 grants returned that coeus does not coeus has 8 grants returned that fibi does not
I'm not sure if the large difference in rows returned is expected? Could it be that updated_timestamp is associated to some different data between coeus and fibi?
I then did sample grant comparisons. I noticed that for all the grants selected, the co-pis were different, and the award_date was different. The update_timestamp was different as well.
grant_number: 126722 coeus has 0 co-pis fibi has 2 co-pis award_date is different update_timestamp is different
grant_number: 126758 coeus has 2 co-pis fibi has 0 co-pis award_date is different award_end is different update_timestamp is different
grant_number: 126758 coeus has 4 co-pis fibi has 0 co-pis award_date is different update_timestamp is different
grant_number: 130925 coeus has 16 co-pis fibi has 4 co-pis award_date is different update_timestamp is different
Summary from meeting with Bob/ORIS for Coeus -> Fibi migration:
Bob stated that based on the SQL that was sent, he doesn't see any issues at this time. He is going to rewrite the SQL a bit so that is conforms with MySQL SQL and send it over to us. We should expect the new queries in a few weeks (he may get to them this week, but he is out next week).
There will be new a connection string and credentials, Bob will send those too.
Bob suggested that we take the updated SQL, and test it against their test system before Oct. 23 when Fibi PROD is going live.
Bob recommended that we should schedule our systems to update to use Fibi as soon as possible after Oct. 23 so that our data remains current.