JonathanReeve / sanger

Margaret Sanger Papers Project Search Engine
0 stars 3 forks source link

better way of tagging parsed / unparsed files than directories xml_added and xml_queue? #63

Open JonathanReeve opened 9 years ago

JonathanReeve commented 9 years ago

The current parsing system is a little cumbersome, especially from the point of view of version control. Moving files back and forth between these directories may create lots of commit noise.

CathyHajo commented 9 years ago

We don't generally do this anymore, since the parse program broke, we just put them in xml_added.

JonathanReeve commented 9 years ago

Since the parsing engine is working now (parse2.php, though, and strangely, not parse.php), it's probably a good idea to move edited files to xml_queue so that they can be parsed.

Ideally, there could be a better organizational model that might look something like this:

Or one or more of these:

CathyHajo commented 9 years ago

Should I be deleting all the files out of the xml_queue once I have committed the changes and synced? What I have been doing is editing in xml_added and then copying the file to xml_queue and commit/sync. Then parsing. It seems to be that this doesn't make a lot of sense, I like your idea of having one directory where all the files live--like our XML drafts was on Dropbox. Will the Git pull ignore backup files? Because they change every time the main file changes and I have to uncheck the boxes not to copy them to the site.

JonathanReeve commented 9 years ago

I think what you're doing sounds fine. You might just have to log into the server directly and pull in changes manually each time you parse, until I can find a better solution. You don't need to delete anything.

Yep, having two directories as a way of keeping track of which files have been parsed is not ideal. I'll look into having a database table for keeping track of which files have been parsed, or some other way of keeping track.

JonathanReeve commented 9 years ago

Re: .bak files (and .bak.bak files), I guessed that those weren't files that needed to be on the production server, so I just added them to .gitignore so that they won't wind up being committed and parsed.