Open leondz opened 13 years ago
No, it's not attempted. Could be useful though.
I've written a small module for managing DCT detection, works flawlessly on the ~240 docs in TimeBank + ATC, as well at the 1.8mil in the TAC KBP source collection (save one file which really has no explicit DCT information, and very little inferable either). I'll work on integrating this and setting a fallback "guess dct from filename / doc content" option if no value for -c is specified and no other information is available.
It's possible to extract DCT (at day granularity) from filenames - is this attemped?
From TimeBank:
VOA19980331.1700.1533.tml WARNING: Could not determine document creation time, use -c to override