Open wenzeslaus opened 3 years ago
I am new to open source ,but I wanna work on this issue I also need guidance
Welcome to the open source world!
Unless you happen to know Perl, it might be most beneficial for you and for the project to rewrite it to Python.
Here are some tips: Read the contributing file. Read what the script utils/csv_dequote.pl is doing. Search online for "reading and writing CSV with Python".
Hello again @wenzeslaus. I would like to start with the good first issues of grass-addons first. I would like to work on this issue next if it's fine
Hello all, please see below:
#!/usr/bin/env python3
#
# csv_dequote.py
#
# Take a .csv file with quoted strings, convert the real comma delimiters
# to pipes (|) and strip away the double quote characters.
#
# USAGE: csv_dequote.py infile.csv [outfile.psv]
#
# If outfile is not given, it will take the basename of the input
# file and give it the .psv extension (".Pipe Sep Vars").
#
import csv
import sys
import os
def csv_dequote(infile, outfile=None):
outsep = '|'
if not infile:
print("No input file was provided.")
print("USAGE: csv_dequote.py infile.csv [outfile.psv]")
sys.exit(1)
if not outfile:
# previous version of this script defined this behaviour thusly:
# "if outfile is not given it will take the basename of the input file and give it the .psv extension"
base, _ = os.path.splitext(infile)
outfile = f"{base}.psv"
print("No output file was provided; using default:", outfile)
if os.path.exists(outfile):
print(f"ERROR: \"{outfile}\" already exists. Will not overwrite; aborting.")
sys.exit(1)
# Registering a custom CSV dialect for consistency across the script
csv.register_dialect('custom', delimiter=',', quotechar='"', escapechar='\\', doublequote=True, lineterminator='\n', quoting=csv.QUOTE_MINIMAL)
try:
with open(infile, mode='r', newline='', encoding='utf-8') as csvin, \
open(outfile, mode='w', newline='', encoding='utf-8') as csvout:
reader = csv.reader(csvin, dialect='custom')
writer = csv.writer(csvout, delimiter=outsep, quoting=csv.QUOTE_MINIMAL)
for row in reader:
try:
writer.writerow(row)
except csv.Error as e:
print(f"Error writing row {row}: {e}")
sys.exit(1)
except IOError as e:
print(f"File I/O error: {e}")
sys.exit(1)
if __name__ == "__main__":
infile = sys.argv[1] if len(sys.argv) > 1 else None
outfile = sys.argv[2] if len(sys.argv) > 2 else None
csv_dequote(infile, outfile)
Considerations:
Python is a good candidate for replacing the Perl script above
The above code is not tested
I'm not 100% sure about what Perl's Text::CSV does under the hood
~ Python's csv module has a "dialect" option for parsing csv's and I don't know if that's right for this context of the above .py file. ~ Python's csv module also has options for setting buffers for csv parsing (depends on usecase of the above script). ~ Does a script converting from .csv to .psv belong in the scope of this repository, or should that be up to the user? ~ ~ Is it safe to assume folks who are using this utility are on a *nix type environment?
Relevant documentation ~
Python docs, csv dialect: https://docs.python.org/3/library/csv.html#csv.Dialect
Text::CSV module documentation https://metacpan.org/pod/Text::CSV#
Hopefully this helps out in part Am curious to hear what folks think of the above and what could be improved
This looks reasonable. Given that this is a helper script, it is not plugged directly into any pipeline we can consider for evaluation or readily test. However, you can try to run the original Perl script on a CSV file and the new Python script on the same file and compare results. I guess the input CSV file should have quoted fields. In any case, you can open a PR and we can continue there.
Python's csv module has a "dialect" option for parsing csv's and I don't know if that's right for this context of the above .py file.
Try to guess (with or without AI) what is the input dialect of the original script. The match does not have to 100%. This is a script one would download and adapt for their own needs.
Python's csv module also has options for setting buffers for csv parsing (depends on usecase of the above script).
Unknown. We can leave that for future.
Does a script converting from .csv to .psv belong in the scope of this repository, or should that be up to the user?
Seems like that's in the scope. PSV is currently often the default in GRASS GIS even when CSV is supported.
Is it safe to assume folks who are using this utility are on a *nix type environment?
It should work on Windows, but people grabbing the file will likely be command line aware whether *nix or not. I would keep the line separators as LF, though.
Helper/utility tool/script _csvdequote.pl triggers warnings from the perl linter in Super-Linter.
To Reproduce
Run perlcritique or perl linter from Super-Linter.
Expected behavior
No linter issues from the code and enabled Perl checks in Super-Linter.
Screenshots
perlcritic:
Super-Linter GitHub Action:
Additional context
Super-Linter checks for Perl need to be disabled now.
This can be fixed by rewriting the script in Python.