Customise output names - Githubissues

davidverweij commented 4 years ago

In my current use of the module, it would be particularly handy to have the output files be named using the data from the .csv. To illustrate, I am generating .docx that need to be sent to the recipients, whose name I fill in the template. In order to trace back which .docx should go to who, it would be convenient to allow parametric customisation of the output file names.

I am thinking of some kind of overloading (although I understand this is not intended in Python), or adding parameters - and checking their validity after opening the .csv. Then, we would definitely need to allow multiple values to ensure some type of uniqueness (and check for this too). Perhaps list parameter or alike?

E.g.

poetry run convert -t template.docx -c data.csv -n ["FIRSTNAME", "LASTNAME"]

With output files as:

john_doe.docx john_doe_2.docx # duplicate jane_doe.docx ...

Thoughts?

jawrainey commented 4 years ago

In my current use of the module, it would be particularly handy to have the output files be named using the data from the .csv. To illustrate, I am generating .docx that need to be sent to the recipients, whose name I fill in the template. In order to trace back which .docx should go to who, it would be convenient to allow parametric customisation of the output file names.

Yes, being able to generate .docx named by a specific option would be great. Of course, that option must exist in the provided .csv so we will need to add validation. Therefore, the -n option must match row headers in the CSV. (maybe add that as the description?)

I am thinking of some kind of overloading (although I understand this is not intended in Python), or adding parameters - and checking their validity after opening the .csv. Then, we would definitely need to allow multiple values to ensure some type of uniqueness (and check for this too). Perhaps list parameter or alike?

We would need to update the variable passed to write, which is currently counter. The multiple values must exist in the csv used, so we would need to validate from the fieldnames. The single_document dict can be used to get the value we're interested in using for the filename (e.g. name). Before we start to enumerate the csvdict we could have a list named filenames and append to it with each loop and then use it as a lookup to ensure uniqueness prior to assigning the filename?

Can you think of a more elegant solution than creating a temporary list for comparison?

An alternative could be to update the specified column (lets say its name) prior to enumeration to better separate the logic, e.g. pass csvdict to some method which loops over the name column and updates the values in place (so if name david appears twice it will become david and david_2). Then we can replace counter with single_document[USER_OPTION] where USER_OPTION in this case is name.

salmannotkhan commented 4 years ago

i was working on this function and something strange is happening i successfully created that verification

the -n option must match row headers in the CSV

but after implementing that feature i can't traverse in csvdict?? i don't know why here is function:

def generate_names(listnm):
    newname = []
    for i in range(len(listnm)):
        if (listnm[i] not in listnm[:i]):
            newname.append(listnm[i])
        else:
            newname.append(listnm[i] + "_" + str(listnm[:i].count(listnm[i]) + 1))
    return newname

this is what i added in convert function:

if ((custom_name != None) and (custom_name not in csv_headers)):
     print("column name not found")
     exit()
else:
     file_names = generate_names(list(row[custom_name] for row in csvdict))

after this block i can't traverse in csvdict this function will return a list with names which we can access using: docx.write(f"{file_names[counter]}.docx") in the end

jawrainey commented 4 years ago

@salmannotkhan -- feel free to make a draft pull request and I can have a look this evening (GMT+1) to try and understand the issue.

I suspect, although not certain, that the reason you cannot enumerate csvdict here is likely because you can only iterate over DictReader's once see here. The reason is because opening files using with statements makes use of generators. When you do list(row[custom_name] for row in csvdict) you're iterating over the open file and after that, it is closed within the context of the with statement.

To explore my hypothesis above, pass csvfile to your method and open it inside generate_items using a with statement? An alternative is to use seek(0) to use the same file ... but this feels like a hack.

davidverweij commented 4 years ago

I had this issue before - and I concur - it iterates using a reader, which is why the code originally opened the .csv twice (a crude fix I admit).

salmannotkhan commented 4 years ago

Got it

salmannotkhan commented 4 years ago

I'll try to implement the generate_names function inside output loop so we don't have to open file twice

jawrainey commented 4 years ago

I'd suggest that instead to abstract the logic to a separate method above as it will make testing it easier. It also keeps the convert method clean and simple 👍

salmannotkhan commented 4 years ago

done with this i used seek because i didn't found any other way

jawrainey commented 4 years ago

Great work -- if you make a PR I can test it and do a code review for you 👍

salmannotkhan commented 4 years ago

yeah sure

davidverweij / csv2docx

Customise output names #16