BCHSI / philter-ucsf

Open source clinical text de-identification
BSD 3-Clause "New" or "Revised" License
111 stars 49 forks source link

[feature request] ignoring input files #4

Closed gknor closed 4 years ago

gknor commented 4 years ago

Hi, I use DVC in my project for data files version control: https://dvc.org/ DVC creates special text metafile with the .dvc file extension. And when I execute filter I get an error related to DVC files:

 File ".../python3.6/site-packages/philter_ucsf/philter.py", line 800, in transform
    contents = self.transform_text_i2b2(self.data_all_files[filename])
KeyError: '.../sample_notes/1.txt.dvc'

It would be nice to add option for skipping files with defined file extension (for example .dvc).

Best regards Grzegorz

kmuenzen commented 4 years ago

Hi @gknor , this issue should be fixed by the latest commit. Hope that helps!