datadesk / python-elections

A Python wrapper for the Associated Press' U.S. election data service.
python-elections.rtfd.org
176 stars 49 forks source link

Testing against local files #96

Open veltman opened 10 years ago

veltman commented 10 years ago

Is there a simple way I'm missing to test against local files instead of hitting the AP live each time? When they run their tests I typically save the raw files by minute so I can spot check later. I tried modifying the lib for that but in the end it was easier to just set up a dummy FTP server to serve the local files.

palewire commented 10 years ago

I never even thought of this idea. What do you think would be a good way of pulling it off?

veltman commented 10 years ago

The way that changes the least would probably be kwargs where, if a path is given for a particular file, it uses that instead of fetching it, like:

# Fetch all the files
AP("username","password")

# Fetch them all locally
 AP("","",results_file="blah.txt",race_file="blah.txt",reporting_unit_file="blah.txt",candidate_file="blah.txt")

# Fetch results but use local for the rest
AP("username","password",race_file="blah.txt",reporting_unit_file="blah.txt",candidate_file="blah.txt")
palewire commented 10 years ago

Hmm. Though there are different files depending on what "result collection" you are reaching for. Compare the state-level results (more commonly used for primaries like we have in this midterm) with the "top of the ticket" and "summary" stuff that is used for the presidential election and balance of power stuff in the Congress.

I wonder if caching the files in the same file structure in a dot folder or in tmp could be served as a "cached" replacement for the FTP source. If so, maybe a simple "use_local_cache" could be submitted to the top level API.

veltman commented 10 years ago

Yeah, if you could just supply a local folder path that would contain a structure mimicking the FTP root, that would also do the trick, e.g.:

AP("","",local_path="/projects/election/test-data-2014-05-25/")

Where that folder contains:

inits/NY/
NY/flat/
Delegate_Tracking/
etc.
palewire commented 10 years ago

I like this feature request, but I doubt I'll have time to work on it until we get into work for the November elections. The reason is that on our end we store each data pull in a database, which is then served to our in-development maps and results pages, so we don't have to tax the AP system much as part of our iterations. So if anybody out there in Gitland wants to take a shot, please do.

ghing commented 9 years ago

I'm thinking about this too. My idea is to just subclass the AP class and override the _fetch method to retrieve the path from the filesystem and return the StringIO object. All other parts of the stack would use the same code, so it seems like a reliable test method. I'm going to give this a shot and let you all know how this goes. If it works well, the core functionality could be factored into APClientBase. Then AP and a new FilesystemAP class could both inherit from APClientBase and this could be documented.

palewire commented 9 years ago

Something like that could probably work.

ghing commented 9 years ago

Here's a working, but fairly limited, example of testing the parsing capabilities of the library by replacing the AP._fetch() method with one that reads the file from the filesystem instead of using the FTP client.

https://gist.github.com/ghing/9b7f8fa9bc66db31d600

To me this is simple enough to do that it doesn't warrant adding this feature to the core library.