Maksakovsky-admin / technical

0 stars 0 forks source link

Indexing may be feasible #9

Open dentarthur opened 5 years ago

dentarthur commented 5 years ago

Started specifying requirements listing references. Just paused:

https://github.com/thecapitalistcycle/tech-indexing

https://github.com/thecapitalistcycle/tech-indexing/blob/master/references.md

Started following your wikipedia article on indexing software to lookup refs.

Looks very promising as found service that can convert from printed index to import to indexing software at reasonable cost. This would remove main blockage.

"For purposes of pricing, what constitutes an "entry" ? An entry is every line that ends with a locator (including cross-references). Each entry can have multiple locators. "

Can you put up a Table (or spreadsheet) with details of what we would be asking for so we can request a quote?

Volumes C0 to C4 existing indexes from Penguin editions

eg C1 is pdf pp 1119 to 1934, 15 pages with 2 cols each with approx x to y entries per column. Randomly guessing 30 entries per column that would be 15 x 60 or 900 entries.

USD $40 per hour for time spent cleaning up input files. Would end up extracting just those pages from the pdfs to ask for her quote for doing that from those .pdfs. Alternatively we do it which could be another block.

Count index pages in each volume/book (C0 Grundrisse, Contribution to Critique of PE, 3 vols of Capital, C4 3 parts of TSV). Rough estimate of items per page/column

Also look up Penguin/Amazon catalogues for epub versions so we can easily get those pages directly with markup instead of from .pdf. Include direct links. Am likely to want to order them if they exist as likely to avoid needing much cleanup.

I haven't finished listing or looking yet. But with that importing part taken care of there is definately software available that should make the job feasible for you without being held up waiting for things from me. Take a look and think about it.