UChicago-Coase-Sandor / pacer_lib

http://pacer-lib.readthedocs.org/
9 stars 11 forks source link

Unique Identifier / Listed Identifier #7

Open zhangchuck opened 10 years ago

zhangchuck commented 10 years ago

How are previously downloaded documents named? (I believe it uses the listed identifier)

How are we naming documents that are downloaded now? (I believe we are using the unique identifier)

How should we name downloaded documents? (Probably the unique identifier; maybe prefix with a u/i)

zhangchuck commented 10 years ago
    (To be implemented) docket_parser() assigns two types of numbers:
    the listed docket number (i.e., the number listed on the page) and
    the unique identifier (i.e., the position of the docket entry on 
    the page). We should default to using the unique identifier, but
    all of the legacy files will be using the listed identifier and we
    will need to reassociate / convert those documents to their unique
    identifier.

    no_type = 'U' --> unique identifier
    no_type = 'L' --> listed identifier

    We have begun implementing this, but this is not completely finished.

    Using the listed identifier should be considered legacy and not advised.

    This will be dangerous in terms of redundant download protection.

    Document this properly once we finish.
zhangchuck commented 10 years ago

What other functions are affected by the switch to the unique identifier?

Is it primarily stuff that hasn't been written yet (document converter, etc.)?