frictionlessdata / tabulator-py

Python library for reading and writing tabular data via streams.
https://frictionlessdata.io
MIT License
235 stars 42 forks source link

wrong CSV separators, starting from HTML table that has commas inside cells #324

Closed aborruso closed 4 years ago

aborruso commented 4 years ago

Hi, if I run tabulator input.html using the below html table, I have

RNDFNC60E16,RIPACANDIDA,85020,POTENZA,250,00
RNDFNC60E16,,,POTENZA,250,00

and not

RNDFNC60E16,RIPACANDIDA,85020,POTENZA,"250,00"
RNDFNC60E16,,,POTENZA,"250,00"

Thank you

<!DOCTYPE html>
<html>
<body>
<table id="results" border="0" class="regpub_dati c35">
        <tbody>
            <tr class="c28">
                <th class="c27">Beneficiario</th>
                <th class="c27">Comune</th>
                <th class="c27">CAP</th>
                <th class="c27">Provincia </th>
                <th class="c27">Importo</th>
            </tr>

            <tr>
                <td class="c31">RNDFNC60E16</td>
                <td class="c31">RIPACANDIDA</td>
                <td class="c31">85020</td>
                <td class="c31">POTENZA</td>
                <td class="c34">250,00</td>
            </tr>

            <tr>
                <td class="c31">RNDFNC60E16</td>
                <td class="c31"></td>
                <td class="c31"></td>
                <td class="c31">POTENZA</td>
                <td class="c34">250,00</td>
            </tr>
        </tbody>
        </table>
        </body>
</html>

Please preserve this line to notify @roll (lead of this repository)

roll commented 4 years ago

Hi @aborruso,

It's only because it's just printed to the console.

from tabulator import Stream

with Stream('tmp/issue324.html') as stream:
    stream.save('tmp/issue324.csv')

This one will give you a proper:

RNDFNC60E16,RIPACANDIDA,85020,POTENZA,"250,00"
RNDFNC60E16,,,POTENZA,"250,00"
aborruso commented 4 years ago

Hi @roll and how to export to CSV using cli?

Thank you

roll commented 4 years ago

It's not supported yet.

Would you like to create a feature request?

aborruso commented 4 years ago

Hi @roll I have done.

What's currently the console output format?

Thank you

roll commented 4 years ago

It's kind mixed - it uses bold for headers and just a simple comma-delimited output for rows