Open jrlegrand opened 3 years ago
Could the "Active Start Date" in RxNav be an easier (and possibly more accurate) substitution here? I say possibly more accurate because some drugs take longer than others to go from FDA approval to market entry.
https://mor.nlm.nih.gov/RxNav/search?searchBy=String&searchTerm=airduo --> click on "Status", Active Date = 6/1/17
https://www.asthmaeducators.org/resources/Pictures/AirDuo%20Launch%20Press%20Release.pdf - market launch was ~April 2017? https://www.accessdata.fda.gov/drugsatfda_docs/nda/2017/208799Orig1s000TOC.cfm - FDA approval was Jan 2017
See this Synthea GitHub issue conversation: https://github.com/synthetichealth/synthea/issues/236
@kristentaytok - do you know where that "Active Start Date" metadata is stored within RxNorm?
This looked promising. See the data structure diagram under the accordion dropdown: https://www.fda.gov/drugs/drug-approvals-and-databases/drugsfda-data-files
Not sure if this helps; I haven't worked much with RxNorm/Nav/Class, but happy to look further into it.
https://rxnav.nlm.nih.gov/api-RxNorm.getRxcuiHistoryStatus.html
@kristentaytok @Bridg109 @yevgenybulochnik - https://rxnav.nlm.nih.gov/REST/rxcui/1918195/historystatus.json (!!!)
@kristentaytok - you beat me to it!
Tylenol says it was active start date of 2005 ... so I don't think this field means what we think it means...
Sad
Hm, FDA only has active NDCs from what I can tell--but it gets updated daily: https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory?elqTrackId=6ecc57b6b7ec4fa7bdcb28cd7a59da1b&elq=795c4bbff787408d9131556bb0434617&elqaid=4397&elqat=1&elqCampaignId=3471
the NDC database files that come from the above link are broken down to product (first 9 digits of NDC) and package (11 digit NDCs). the product file has more details like drug names. but "startmarketingdate" goes back to 1900
@hodgesr2 - can you start looking into what is possible for this?
Basically, I would want to know the very earliest date (or even just a year) a medication product was available in a computer understandable and easily updatable format. Ideally, this would be for more than just active NDCs.
@kristentaytok has started doing some analysis so please reach out to her if you have questions about what she's commented above.
yeah I'll take a look at it and see what I can find.
Also sent a email to the FDA to see if they have some info on any data that may help with is.
I found this... but can't seem to find a CSV with all the information. They are all separate CSVs for each drug entity. But it has an action date which I think translates to approval of the medication on the market.
Orange book has some decent data I think.
Doesn't have NDC, but has ingredient and approval date. However, the approval dates don't go back beyond 1982
LINK to data: https://www.fda.gov/drugs/drug-approvals-and-databases/approved-drug-products-therapeutic-equivalence-evaluations-orange-book
Orange Book Data files compressed
Think I found what may work.
NDC database file. has first two sections of NDC and Start Market Date. Looks accurate from a spot check I did on some drugs.
LINK To data: https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory
Download NDC database file
Let me know if that is what you were looking for
I think @kristentaytok mentioned this above but noted that it's only active NDCs... but maybe this is as close as we are going to get.
In my lisinopril example way up at the top, I do see that the earliest start marketing date is represented as 07/01/2002 in the FDA spreadsheet.
We need join a product-level RXCUI (SCD/GPCK/SBD/BPC) to the NDCPACKAGECODE
in the package.xls file and then join the product.xls file on PRODUCTNDC
and select the earliest date.
@hodgesr2 - you can use this file as an intermediary table to do some testing: https://drive.google.com/file/d/1Nc-Ri2EpDQzebiPZsjXAvBDtodw_BGLc/view?usp=sharing
Let me know if you have questions. We should end up with a file with three columns: MEDICATION_PRODUCT_RXCUI
, MEDICATION_PRODUCT_TTY
, and EARLIEST_START_MARKETING_DATE
.
@kristentaytok - the other question is how we actually implement this. I'm guessing it would be crazy complicated to add this to the CSV distributions... so my other thought is that we do a check AFTER the patient gets randomly assigned a product - check if the year is greater than X, if so - continue on - if not, go back to the ingredient distribution check. We would also want an initial date check to see if we are at least past the earliest date that at least ONE of the meds were available - otherwise, we may get stuck in an infinite loop and break Synthea lol.
OR I guess how hard would it be to add to the CSV (if we can have Date as an attribute)... like if it was before a date a product was available then the distribution would always be 0 for that product. This is also wonky b/c we are trying to apply 2018 distributions to all the years prior to 2018 - back to 1901...
I think the easiest implementation on the MEPS-CSV creation front is to create distributions for each calendar year. So basically any year after the earliest_start_marketing_date to present is given a row, and similar to products that aren't being used in a certain state/age group/gender group, those that have NULLs in a given year will be filled with a zero. So we're just adding a year column to the CSV files that gets aggregated to calculate a unique distribution for each year.
Not sure about how to change the Synthea JSON part of the implementation though. But would this approach for the CSVs help?
I think @kristentaytok mentioned this above but noted that it's only active NDCs... but maybe this is as close as we are going to get.
In my lisinopril example way up at the top, I do see that the earliest start marketing date is represented as 07/01/2002 in the FDA spreadsheet.
We need join a product-level RXCUI (SCD/GPCK/SBD/BPC) to the
NDCPACKAGECODE
in the package.xls file and then join the product.xls file onPRODUCTNDC
and select the earliest date.
I might be missing something here, but i was thinking we just need to use the "PRODUCTNDC" file and join on the first 9 digits of the NDC from the RXCUI_NDC table with the PRODUCTNDC (no need to use the PACKAGE file/NDCPACKAGECODE).
@kristentaytok yeah I think you are right that we wouldn't need the product file - just the package file. And I think the CSV idea you had is what we should do for the implementation if possible. Maybe I'll look into seeing if it's possible to use date in a table transition.
One thought though is that if we want to get to the month or even day level with start dates for med products, that distribution strategy would get even more unwieldy. But year is how I've seen other modules do it so I think it's ok.
Yea I was thinking about the month/day stuff as well, but agree it would get way too big and not worth the extra detail.
It'll be pretty easy to implement the column in the CSV, but my question is for the MDT do we want to download and store daily versions of this file in our db or just connect to the site/link and have it run on the day the user runs the MDT? The disadvantage to the latter is the user may not get the same results if drugs/codes get removed from the file between runs.
For MDT database loading, I think we would treat the FDA NDC file(s) like we do the RxNorm files. We run them once when the user initializes the MDT app, and then give the user the option to periodically refresh all the database tables, or pick specific ones to update (i.e. RxNorm or FDA or MEPS). I'm mostly concerned about not prescribing modern drugs in really ancient years - not about missing cutting edge therapies... unless I'm misunderstanding your concern.
For MDT JSON, we also will probably still want to know the earliest possible year from all the products in the submodule so we can bypass all the logic if the year is earlier than that because we know the patient will not get a match.
I liked the idea of doing a check. To prevent an infinite loop, could we create a random number generator for the year date between a certain range? Like, if check is false, then go to random generated number for the year within a range and recheck? I wonder how much of an extra computational load it would put on it to create the dataset like this.
I'm going to be in Aruba rest of this week and half of next week. I will work on that intermediary table over the weekend probably in between the beach and eating @ brazilian steakhouse and german beer.
@jrlegrand - makes sense given our main goal is to prevent inappropriate med orders from earlier years. My concern was more around reproducibility (and is probably more of an edge case than a major issue): if we ran the NDC file today and the earliest_start_marketing_date for drug X was 1950, for example, and the NDC became obsolete tomorrow and fell off the list such that the next earliest_start_marketing_date for drug X became 1990, then the user running the MDT tomorrow or later will get very different results for drug X vs. what we get today. Hope this makes it more clear?
But again, since the priority is more to address the bigger issue that we're properly prioritizing (preventing med orders from getting created before a drug was available), then this smaller issue may just be more of a caveat that the end-user can consider addressing by downloading and saving the daily FDA files over time and taking the earliest_start_marketing_date across the files.
@kristentaytok - I hear you now. Yeah I feel like this might be more of an edge case but not really sure. Still probably the best data source we have for now.
@jrlegrand do you want the intermediate file to have a date data type for market date as well? Currently it is in integer format with year first.
Yea I think so
At very least we just need to pull year out of that integer
Ok. Cleaned_CSV.csv is loaded into google drive. Let me know if that is what you want. The NDCs may load without the leading zeros in excel. It's annoying. May have to format the column and do a special type of format of "000000000" to get the leading zeroes on. There are 3 date columns: the original, the year, and a date format column.
There are also NULLs for the TTY and RxCUI columns. I didn't know if you wanted those removed or not but I left them in there in case you needed to validate anything.
What software did you use to put this together?
On Mon, May 10, 2021, 5:10 PM Robert Hodges @.***> wrote:
There are also NULLs for the TTY and RxCUI columns. I didn't know if you wanted those removed or not but I left them in there in case you needed to validate anything.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coderxio/medication-diversification/issues/30#issuecomment-837418246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUPMCQQ4GJDRLE7VBVT3ZTTNBKT3ANCNFSM424AJBTA .
Excel and Python (Anaconda).
@hodgesr2 - we actually need NDCPACKAGECODE (the PACKAGE NDC) instead of the PRODUCTNDC unless we have a really reliable way of getting from the MEPS package NDCs to product NDCs. Also, could you please pull in the medication_product_name field for human readability? Did you try to join to the RxNorm rows based on PACKAGE or PRODUCT NDC?
k. I'll take a look at it again Friday when I get back. I joined on PRODUCTNDC. I'll try and redo it with NDCPACKAGECODE
And it would be ideal if solely done in python that way we can code the process into automatic process
On Mon, May 10, 2021, 7:52 PM Robert Hodges @.***> wrote:
k. I'll take a look at it again Friday when I get back. I joined on PRODUCTNDC. I'll try and redo it with NDCPACKAGECODE
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coderxio/medication-diversification/issues/30#issuecomment-837570962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUPMCRRTXFW2EKHVYGWKWTTNB5TVANCNFSM424AJBTA .
do you want it in a ipynb file that jupyter uses (my preference) or prefer a py file?
Py file would be best but we can convert it from jupyter pretty easy if you want to do it that way
On Mon, May 10, 2021, 8:15 PM Robert Hodges @.***> wrote:
do you want it in a ipynb file that jupyter uses (my preference) or prefer a py file?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coderxio/medication-diversification/issues/30#issuecomment-837590825, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUPMCUFQHOZE6ZC32WCV4TTNCAJZANCNFSM424AJBTA .
@hodgesr2 - we should also pull in ENDMARKETINGDATE
Reopening because we never really finished this feature.
Synthea has an attribute called time
(I think) that can be used in a transition table. It's the number of seconds since some year.
Problem Statement
If we don't put any date limits on our modules, patients in 1901 are going to be getting modern medications.
Criteria for Success
Some automated way to query FDA to link first available date of drug application to a product RXCUI or NDC.
Additional Information
See example here: https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=overview.process&ApplNo=075743 Original approval date = 07/01/2002
A lot of modules list this in the remarks: https://github.com/synthetichealth/synthea/blob/master/src/main/resources/modules/medications/moderate_opioid_pain_reliever.json