astrofrog / robo-ph

#dotastro hack
3 stars 2 forks source link

An implementation for substitution of latex with regexs. #31

Open jason-neal opened 8 years ago

jason-neal commented 8 years ago

Hi there,

I have implemented a way for substitution of the pesky latex commands with regular expressions. I am using similar method for my own project with the aim of tts for entire arXiv articles (hence the large regex collection).

Please let me know if there is other substitutions you want added/removed. More will need to be added in the future as things I haven't yet caught are found.

You will need to check that this actually works with roboph as I am unable to run the main scripts due to apple exclusive features.

Enjoy Jason

astrofrog commented 8 years ago

@jason-neal - thanks for the contribution, and sorry for not replying sooner! (I'm in the middle of a move). I'll test this out now and will merge if it all works :)

jason-neal commented 8 years ago

With regular expressions there are multiple possibilities. I think it could be better to just include brackets () around the last [^A-Za-z] and include it as a second group at the end \g<2>. Also I think the .* should be removed from the front.

So I think it could be changed to (r"([^A-Za-z])pc[^(A-Za-z]) , r"\g<1> parsec\g<2>")

Note: [^A-Za-z] matches to anything that is is not a letter which I have used to avoid matching the units in the middle of words.