Closed 0xKD closed 2 years ago
Can you please post the partial json for the scheme around 5 Jun? haven't faced such an issue yet but yeah, it could be possible if the pdf boundaries extend/overlap.
{
"scheme": "Quantum India ESG Equity Fund - Direct Plan Growth",
"advisor": null,
"rta_code": "123ESGPG",
"type": "EQUITY",
"rta": "KFINTECH",
"isin": "INF082J01382",
"amfi": "147372",
"open": "0.000",
"close": "8298.919",
"close_calculated": "8298.919",
"valuation":
{
"date": "2021-07-19",
"value": "133280.64",
"nav": "16.06"
},
"transactions":
[
{
"date": "2021-06-05",
"description": "Systematic Investment (1/932)",
"amount": "65531.72",
"units": "4198.060",
"nav": "15.61",
"balance": "4198.060",
"type": "PURCHASE_SIP",
"dividend_rate": null
},
{
"date": "2021-06-05",
"description": "*** Stamp Duty ***",
"amount": "3.28",
"units": null,
"nav": null,
"balance": "4198.060",
"type": "STAMP_DUTY_TAX",
"dividend_rate": null
},
{
"date": "2021-06-05",
"description": "*** Stamp Duty ***",
"amount": "3.28",
"units": null,
"nav": null,
"balance": "4198.060",
"type": "STAMP_DUTY_TAX",
"dividend_rate": null
},
{
"date": "2021-07-05",
"description": "Systematic Investment (2/932)",
"amount": "65531.72",
"units": "4100.859",
"nav": "15.98",
"balance": "8298.919",
"type": "PURCHASE_SIP",
"dividend_rate": null
},
{
"date": "2021-07-05",
"description": "*** Stamp Duty ***",
"amount": "3.28",
"units": null,
"nav": null,
"balance": "8298.919",
"type": "STAMP_DUTY_TAX",
"dividend_rate": null
}
]
}
Oh. this is indeed weird. not sure how to resolve it. 😕
Simple duplicate check won't work since it is possible to have multiple stamp duty transactions on the same day, if there are multiple purchases / switch in transactions
Yeah doing it in casparser isn't reliable.
This is a bug in pdfminer/mupdf but I thought It would be useful to document (since the implications are somewhat critical if you rely on the output of casparser).
If you have pages that like look this across page boundaries, it seems to count the transaction at start of page two in the previous page as well. For me, it counts the
*** Stamp Duty***
transaction at the start of the second page twice (once as part of the previous page 4, and again for the actual first time it is encountered - in page 5).My guess is the
mediabox
(used by pdfminer to determine page boundaries) of the page is larger than necessary and extends into the second one.