Closed ThomasdenH closed 3 years ago
@ThomasdenH I would appreciate if you could have a look at #81 and see if that approach works for your CSV importers. It should allow to easily cover all use cases the current CSV importer covers and many more. For an example see the newcsvbank.py
importer in the examples. In particular, the new CSV importer base class has a skiplines
class variable to define how many lines to skip at the beginning of the file, and a names
class variable to define whether the first row contains column names or not.
Thanks! The new importer fits my needs well.
Thank you for having a look. I didn't mean that to be a replacement for fixing the existing importer and thus to this PR. However, I tend toward making csvbase.Importer
the recommended solution, deprecate csv.Importer
and remove it after the first release of beangulp. Do you have any comment on the extension interface of csvbase.Importer
? I would appreciate some feedback before I merge it into beangulp.
finalize
accepted None
as a return type to indicate that a transaction should be removed.I couldn't really figure out how to use the filing and identify mixins, but that is probably due to my limited experience with Python.
This is by design. The mixins are considered an awkward API are are going to be phased out with the switch to the new Importer
interface. I find it much easier to implement the identify()
and the archiving related methods directly in the importer. If your experience is different I would like to hear about it.
I use one csv format where amounts use a comma as a decimal separator. It was very easy to add a column type to fix that, but it could also be a simple addition to the Amount type.
I thought about it, but I failed in coming up with a general API that does not involve passing a parse function to the Column
definition and if you do that you can as well define a new Column
type:
class Amount(csvbase.Amount):
def parse(self, value):
return super().parse(value.replace(',', '.')
Do you have in idea for a suitable API?
The old formatter filtered out 0 value transactions (I think). This one doesn't, which is probably better, but I would still like some way to filter them out.
I think the old CSV importer did some filtering, indeed, but why would be transactions with zero amount be in the bank statements (or other source) if they do not contain any interesting information? I thus thought that the filtering is an anti-feature. I think that filtering is more clearly implemented as a second step:
class Importer(csvbase.Importer):
def extract(self, filepath, existing):
entries = []
for entry in super().extract(filepath, existing):
if isinstance(entry, data.Transaction) and entry.postings[0].units.number == ZERO:
continue
entries.append(entry)
return entries
but I may be convinced otherwise.
I find it much easier to implement the identify() and the archiving related methods directly in the importer. If your experience is different I would like to hear about it.
Maybe some convenient function somewhere to make checking the mime-type doable in one line? Other than that, overriding the methods was indeed easier.
Do you have in idea for a suitable API?
Your example was pretty much how I handled it. My specific case could be solved with a decimal_separator=','
in the csvbase.Amount
constructor, but not sure if that's worth it.
I think that filtering is more clearly implemented as a second step:
In my case this is a good solution. In the general case there may be situations where the transactions for some rows can't even be built. Is that even possible, if some attributes don't appear for a row? In that case it may be required to filter rows before processing the transactions.
Maybe some convenient function somewhere to make checking the mime-type doable in one line?
Currently it is three lines:
mimetype, encoding = mimetypes.guess_type(filepath)
if mimetype == 'text/csv':
return True
which, loosing some readability, can be reduced to
return mimetypes.guess_type(filepath)[0] == 'text/csv'
It would be nicer if guess_type()
would return a named tuple, but this is not the case, unfortunately.
It could be simplified to something like
return mimetypes.check_type(filepath, 'text/csv')
but I don't know if it add enough convenience to make it worth remembering a new API.
In my case this is a good solution. In the general case there may be situations where the transactions for some rows can't even be built. Is that even possible, if some attributes don't appear for a row? In that case it may be required to filter rows before processing the transactions.
In this case my advice is to override the etract()
method entirely. It is pretty straightforward. Alternatively, the read()
method can be overwritten:
class Importer(csvbase.Importer):
def read(filepath):
for row in super().read(filepath):
if not row.amount:
continue
yield row
This has the drawback that you loose the correct line number information in the transactions metadata but in most applications it does not matter at all (it is not serialized by default and it is there only to provide the correct ordering). I think that in the vast majority of cases a missing filed should result in an error and adding one option to skip rows with missing values would require a way to specify which values are optional, which seems a lot of work for very little gain. The API has been thought too be extensible more than configurable, and I think it works rather well.
Your example was pretty much how I handled it. My specific case could be solved with a
decimal_separator=','
in thecsvbase.Amount
constructor, but not sure if that's worth it.
I added a subs
argument to the csvbase.Amount
object constructor. It takes a dictionary mapping regular expression patters to replacements strings. The replacements are applied with re.sub()
in the order provided before passing the string to decimal.Decimal()
. I figure that this is the simplest most generic way to handle it. For example, it allows to handle amounts with currency prefixes: csvbase.Amount(0, subs={'^\\$(.*)': '\\1', ',': ''})
parses strings like $1,000.00
.
Thank you very much for your feedback.
Great! Thank you for the work on this project!
If a csv file is parsed without a header that contains only one entry, the current importer would interpret it as a header. This PR gives an additional setting to circumvent the automatic detection of a header.