Closed sebbacon closed 6 years ago
Overall looks good but 3 things:
Unclear Sponsor Name Given
and Unclear Sponsor Name Given - Medicines Development (Infectious Diseases)
should not both have their own lines I don't think. It should just be each specific Unclear Sponsor Name.
I'm unsure based on the data you provided why Unclear Sponsor Name Given - Medicines Development (Infectious Diseases)
is marked as having "Inconsistent Data"
Lilly S.A. probably shouldn't be normalising to "Lilly" but rather "Eli Lilly." So that's unexpected.
Re. (1): I don't understand what the correct output would look like?
Re. (2): the trial_status
field is 4, which means it's a blank trial status (in the code comments it says "a blank trial status usually indicated a paediatric trial taking place wholly outside of the EU/EEA")
Re. (3): that's just because I made up the normalisation spreadsheet row for that trial, so you can ignore
Unclear Sponsor Name Given - Medicines Development (Infectious Diseases)
in the rankings. I don't think we're going to need a category for just Unclear Sponsor Name Given
as that, in the normalisation spreadsheet, is acting basically the same as a parent company would be for all the individual "Unclear Sponsor Name" trials. It should never appear by itself in normalized_name_only
which is what drives the groupings for the website. A useful grouping mechanism for other things like filtering all the "Unclear Sponsors" and if we want custom text to eventually be tied to it.Still not following (1).
If GSK is a parent of Foo Corp, and Foo Corp has a trial, then we show the trial for both GSK and Foo Corp but count it just once in the summary data. Right?
If so, then I don't understand
don't think we're going to need a category for just Unclear Sponsor Name Given as that, in the normalisation spreadsheet, is acting basically the same as a parent company would be for all the individual "Unclear Sponsor Name" trials
That's not my understanding of how the site functions.
So GSK owns Foo Corp so in the data it looks like:
sponsor_name - Foo Corp LLC normalized_name_only - Foo Corp normalized_name - GlaxoSmithKline
That trial appears for Foo Corp and then there is the auto-generated text at the bottom that says: "We think Foo Corp is now effectively part of GlaxoSmithKline"
And on the GlaxoSmithKline page we say "We think GlaxoSmithKline is now also responsible for the trials of: Foo Corp, Bar LLC, etc..."
But my understanding was that Foo Crop trial still lived just under Foo Corp if it was assigned to it in normalized_name_only
and to avoid misappropriation since my M&A research might not be full proof we just use that text at the bottom to link the two.
If the data said something like:
sponsor_name - Foo Bar LLC (a GSK Company) normalized_name_only - GlaxoSmithKline normalized_name - GlaxoSmithKline
Then Foo Bar LLC would not have an entry on the website (though I have a feature request to make it so if you search Foo Bar LLC, you would get to GSKs page but that's a separate thing)
If the trial is sponsored by both GSK and Foo Crop (as in both are listed as sponsors in 1 or more country trials) it would appear in both sponsors pages on the website and only count once for the overall stats.
Using this data, the home page now shows the following:
@NickCEBM, please could you review and confirm this is as expected?