ahirner / TabulaRazr-OS

Extract tabular data and semantically discover it with ease! (OS)
GNU Affero General Public License v3.0
21 stars 3 forks source link

For XIRR Calculation, month and day for cashflows 1 through x should come from debt service table header #3

Open joffemd opened 8 years ago

joffemd commented 8 years ago

If I look at http://tabularazr.eastus.cloudapp.azure.com:7081/calculate_xirr/muni_bonds/ER942906-ER737178-ER1138804.pdf.txt I see that the month and date of XIRR cash flows other than cash flow zero is based on today's month and day. This isn't correct. The cashflow dates should be based on the month and day given in the debt service table header. If that doesn't work, use the day and month of the bond's maturity date. Otherwise, default to August 1, which is the most common value I have seen.

ahirner commented 8 years ago

ack, it's because date recognition takes today as default fallback (without complaining as of now). The solution is exactly as pointed out above.

Also, down the road, due date recognition needs better triangulation and use of lines above and below keywords to increase reliability. For example when year is not mentioned such it's the case for line 13 in this document.

joffemd commented 8 years ago

If you can’t find the year somewhere on the first page, it can be inferred with reasonable accuracy from the debt service table. Under certain circumstances, it would be the same year as the first year in this table. In other circumstances, it would be one year earlier. I could spell these circumstances out if and when we need to do this work.

From: Alexander Hirner [mailto:notifications@github.com] Sent: Sunday, February 28, 2016 11:12 AM To: ahirner/TabulaRazr-OS TabulaRazr-OS@noreply.github.com Cc: Marc Joffe marc@publicsectorcredit.org Subject: Re: [TabulaRazr-OS] For XIRR Calculation, month and day for cashflows 1 through x should come from debt service table header (#3)

ack, it's because date recognition takes today as default fallback (without complaining as of now). The solution is exactly as pointed out above.

Also, down the road, due date recognition needs better triangulation and use of lines above and below keywords to increase reliability. For example when year is not mentioned such it's the case for line 13 in this document.

— Reply to this email directly or view it on GitHubhttps://github.com/ahirner/TabulaRazr-OS/issues/3#issuecomment-189924017.