Closed passing2961 closed 4 years ago
Hi Young-Jun,
Those results look reasonable. Remember that the output of that script is to split the text into sentences. The BERT LM pretraining scripts take care of the tokenization.
Since the MIMIC data is protected by a DUA, I ask that you please remove the examples from your question above ASAP. Data from MIMIC (even if only small paragraphs) should not be posted publicly on the web without permission. Thanks.
Dear Emily,
Thanks for kind reply. As you said, I remove the examples.
Sorry for that, I will be careful from next time.
Thanks for removing. I'm closing this issue, but feel free to reopen if your question wasn't fully addressed.
Hi Emily,
After I acquire an access to MIMIC III database, I preprocess this data following your procedure (i.e. format_mimic_for_BERT.py).
But, I can not have a confidence about below results. Is it right result?
(after format_mimic_for_BERT.py)
Thanks Young-Jun