ambuda-org / ambuda

Main application code for Ambuda, a breakthrough Sanskrit library (ambuda.org)
https://ambuda.org
MIT License
91 stars 24 forks source link

Add option to download text as markdown #71

Open vvasuki opened 2 years ago

vvasuki commented 2 years ago

Could be proofreading texts or published texts.

Desiderata -

akprasad commented 2 years ago

Right now there are three download cases on my roadmap:

For Markdown, I have some questions:

vvasuki commented 2 years ago

For Markdown, I have some questions:

* How do you read a Markdown text today? What programs, etc.
* What would a markdown download would give you that plain text and PDF would not?

image

A better question might be - "What would plain text download give you that a markdown doesn't?"

* Do you know others who would prefer to read texts through markdown?

Basically everyone who cares to read LARGE texts as a plain text files, everyone who uses static website generation. I recall that @shreevatsa and @drdhaval2785 have used markdown.

vvasuki commented 2 years ago

So far, the most convenient way to produce markdown from TEI I've found : https://github.com/sanskrit-coders/doc_curation/blob/master/doc_curation/tei.py . (So, here it's mostly code reuse with minor modifications.)

akprasad commented 2 years ago
  • Sections, for one.
  • Better conservation of important formatting (bold, italics, footnotes etc..)

Why are PDFs inadequate here?

A better question might be - "What would plain text download give you that a markdown doesn't?"

More useful for NLP applications and text mining. But that's about all that comes to mind.

So far, the most convenient way to produce markdown from TEI I've found

Wonderful! My main concern was that we'd have to maintain the TEI -> XML logic indefinitely going forward. If there's an out-of-the-box solution that someone else will maintain, I think there's no reason not to add it.

vvasuki commented 2 years ago
  • Sections, for one.
  • Better conservation of important formatting (bold, italics, footnotes etc..)

Why are PDFs inadequate here?

So far, the most convenient way to produce markdown from TEI I've found

Wonderful! My main concern was that we'd have to maintain the TEI -> XML logic indefinitely going forward. If there's an out-of-the-box solution that someone else will maintain, I think there's no reason not to add it.

You mean TEI -> MD? I keep some custom variants of the main TEI stylesheets as well (eg. sarit here ). Maintenance might therefore not be a big issue.

akprasad commented 2 years ago

You mean TEI -> MD?

Yes, my mistake.