Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

fix(openchallenges): dumping script quick fixes #2543

Closed vpchung closed 4 months ago

vpchung commented 4 months ago

Changelog

I have verified that these changes will update the CSVs as expected:

>>> challenges = df[["id", ..., "updated_at"]]
>>> challenges.loc[460, "description"]
'The Food and Drug Administration (FDA) -\u202fCenter for Devices and Radiological Health (CDRH), Sage Bionetworks, and\u202fprecisionFDA call on ...'
>>> 
>>> challenges = (
...         challenges.replace({r"\s+$": "", r"^\s+": ""}, regex=True)
...         .replace(r"\n", " ", regex=True)
...         .replace("'", "''", regex=True)
...         .replace(u"\u2019", "''", regex=True)  # replace curly right-quote 
...         .replace(u"\u202f", " ", regex=True)  # replace narrow no-break space
...         .replace(u"\u2060", "", regex=True)  # remove word joiner
...     )
>>> challenges.loc[460, "description"]
'The Food and Drug Administration (FDA) - Center for Devices and Radiological Health (CDRH), Sage Bionetworks, and precisionFDA call on ....'