OSGeo / grass-addons

GRASS GIS Addons Repository
https://grass.osgeo.org/grass-stable/manuals/addons/
GNU General Public License v2.0
104 stars 155 forks source link

[Bug] Fix or replace csv_dequote.pl Perl script #620

Open wenzeslaus opened 3 years ago

wenzeslaus commented 3 years ago

Helper/utility tool/script _csvdequote.pl triggers warnings from the perl linter in Super-Linter.

To Reproduce

Run perlcritique or perl linter from Super-Linter.

perlcritic .

Expected behavior

No linter issues from the code and enabled Perl checks in Super-Linter.

Screenshots

perlcritic:

tools/csv_dequote.pl: Bareword file handle opened at line 49, column 1.  See pages 202,204 of PBP.  (Severity: 5)
tools/csv_dequote.pl: Bareword file handle opened at line 50, column 1.  See pages 202,204 of PBP.  (Severity: 5)

Super-Linter GitHub Action:

2021-10-02T07:49:28.2513009Z File:[./tools/csv_dequote.pl]
2021-10-02T07:49:28.2556332Z ERROR! Found errors in [perl] linter!
2021-10-02T07:49:28.2559533Z ERROR:[Can't locate Text/CSV.pm in @INC (you may need to install the Text::CSV module) (@INC contains: /usr/local/lib/perl5/site_perl /usr/local/share/perl5/site_perl /usr/lib/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/core_perl /usr/share/perl5/core_perl) at ./tools/csv_dequote.pl line 23.
2021-10-02T07:49:28.2561103Z BEGIN failed--compilation aborted at ./tools/csv_dequote.pl line 23.]

Additional context

Super-Linter checks for Perl need to be disabled now.

This can be fixed by rewriting the script in Python.

sumit-158 commented 2 years ago

I am new to open source ,but I wanna work on this issue I also need guidance

wenzeslaus commented 2 years ago

Welcome to the open source world!

Unless you happen to know Perl, it might be most beneficial for you and for the project to rewrite it to Python.

Here are some tips: Read the contributing file. Read what the script utils/csv_dequote.pl is doing. Search online for "reading and writing CSV with Python".

AnuRage-git commented 10 months ago

Hello again @wenzeslaus. I would like to start with the good first issues of grass-addons first. I would like to work on this issue next if it's fine

ChromiteExabyte commented 3 months ago

Hello all, please see below:

WIP Python replacement for csv_dequote.pl Perl Script

#!/usr/bin/env python3
#
# csv_dequote.py
#
#  Take a .csv file with quoted strings, convert the real comma delimiters
#  to pipes (|) and strip away the double quote characters.
#
#  USAGE: csv_dequote.py infile.csv [outfile.psv]
#
#  If outfile is not given, it will take the basename of the input
#  file and give it the .psv extension (".Pipe Sep Vars").
#

import csv
import sys
import os

def csv_dequote(infile, outfile=None):
    outsep = '|'

    if not infile:
        print("No input file was provided.")
        print("USAGE: csv_dequote.py infile.csv [outfile.psv]")
        sys.exit(1)

    if not outfile:
         # previous version of this script defined this behaviour thusly: 
        # "if outfile is not given it will take the basename of the input file and give it the .psv extension"
        base, _ = os.path.splitext(infile)
        outfile = f"{base}.psv"
        print("No output file was provided; using default:", outfile)

    if os.path.exists(outfile):
        print(f"ERROR: \"{outfile}\" already exists. Will not overwrite; aborting.")
        sys.exit(1)

    # Registering a custom CSV dialect for consistency across the script
    csv.register_dialect('custom', delimiter=',', quotechar='"', escapechar='\\', doublequote=True, lineterminator='\n', quoting=csv.QUOTE_MINIMAL)

    try:
        with open(infile, mode='r', newline='', encoding='utf-8') as csvin, \
             open(outfile, mode='w', newline='', encoding='utf-8') as csvout:

            reader = csv.reader(csvin, dialect='custom')
            writer = csv.writer(csvout, delimiter=outsep, quoting=csv.QUOTE_MINIMAL)

            for row in reader:
                try:
                    writer.writerow(row)
                except csv.Error as e:
                    print(f"Error writing row {row}: {e}")
                    sys.exit(1)

    except IOError as e:
        print(f"File I/O error: {e}")
        sys.exit(1)

if __name__ == "__main__":

    infile = sys.argv[1] if len(sys.argv) > 1 else None
    outfile = sys.argv[2] if len(sys.argv) > 2 else None
    csv_dequote(infile, outfile)

Considerations:

~ Python's csv module has a "dialect" option for parsing csv's and I don't know if that's right for this context of the above .py file. ~ Python's csv module also has options for setting buffers for csv parsing (depends on usecase of the above script). ~ Does a script converting from .csv to .psv belong in the scope of this repository, or should that be up to the user? ~ ~ Is it safe to assume folks who are using this utility are on a *nix type environment?

Relevant documentation ~

Python docs, csv dialect: https://docs.python.org/3/library/csv.html#csv.Dialect

Text::CSV module documentation https://metacpan.org/pod/Text::CSV#

Hopefully this helps out in part Am curious to hear what folks think of the above and what could be improved

wenzeslaus commented 3 months ago

This looks reasonable. Given that this is a helper script, it is not plugged directly into any pipeline we can consider for evaluation or readily test. However, you can try to run the original Perl script on a CSV file and the new Python script on the same file and compare results. I guess the input CSV file should have quoted fields. In any case, you can open a PR and we can continue there.

Python's csv module has a "dialect" option for parsing csv's and I don't know if that's right for this context of the above .py file.

Try to guess (with or without AI) what is the input dialect of the original script. The match does not have to 100%. This is a script one would download and adapt for their own needs.

Python's csv module also has options for setting buffers for csv parsing (depends on usecase of the above script).

Unknown. We can leave that for future.

Does a script converting from .csv to .psv belong in the scope of this repository, or should that be up to the user?

Seems like that's in the scope. PSV is currently often the default in GRASS GIS even when CSV is supported.

Is it safe to assume folks who are using this utility are on a *nix type environment?

It should work on Windows, but people grabbing the file will likely be command line aware whether *nix or not. I would keep the line separators as LF, though.