Bunlong / react-papaparse

react-papaparse is the fastest in-browser CSV (or delimited text) parser for React. It is full of useful features such as CSVReader, CSVDownloader, readString, jsonToCSV, readRemoteFile, ... etc.
https://react-papaparse.js.org
MIT License
364 stars 61 forks source link

How to detect the encoding of the loaded file before displaying data ? #82

Open dimitri-hoareau-WEL opened 3 years ago

dimitri-hoareau-WEL commented 3 years ago

Hi !

I had a problem for displaying special characters with CVSReader due to csv files which were encoded as ISO-8859-1 instead of utf-8 (ex : Pyr�n�es-Atlantiques )

By adding :

<CSVReader
  config={
    encoding: "ISO-8859-1",
  }
>
 ...
</CSVReader>

in my code it's working, the browser can now read special characters without problems. But the new problem is when loading a file encoded in utf-8, the characters are not displayed properly (ex : Pyrénées-Atlantiques )

My problem is that I am working with clients who do not use the same encoding for their csv files. Some clients use "utf-8", others use "ISO-8859-1". And I can not know in advance what will be the encoding of the file used.

Here is my code :

let changeEncoding = false 

   const  handleOnFileLoad = (data) => {

    data.map(element => {
      if (element.data.find(element => element.includes("�"))) {
        changeEncoding = true
      } 
    })
    if (changeEncoding) {
      alert("Some characters of your file will not display properly. Please load again yout file.")
      dispatch(setEncodingForExport("ISO-8859-1"))
    } else {
      dispatch(setEncodingForExport("UTF-8"))
    }
    data = data.slice(1)

    const enrollFieldArray = {}

    enrollFieldArray["data"] = data.map((element => 
      element.data
    ))

    dispatch(getDataFromUploadedCsv(enrollFieldArray));
    displayTable();

  };
  <CSVReader
  onFileLoad={handleOnFileLoad}
  onError={handleOnError}
  ref={buttonRef}
  noClick
  noDrag
  config={encoding}
  >

I use redux and the "encoding" variable is in the state with this default value :

encoding: {encoding: "UTF-8"}

With this solution, the client must load the file a first time to update the state with the correct value of "encoding", and load the file a second time to display data with the correct encoding.

Is there a native CSVReader's method that allows you to detect the encoding of the loaded file before displaying the data ?

Thank you very much for your help

Dimitri

Bunlong commented 3 years ago

@dimitri-hoareau-WEL Would you like to check data of the file is UTF-8 or ISO-8859-1 before upload?

dimitri-hoareau-WEL commented 3 years ago

Yes ! I wanted to know if it's possible to do that ?

luuddan commented 3 years ago

Hi!

I am having the same issue, is this being worked on?

exaucae commented 3 years ago

I may send a pull request in the coming weeks!

andirkh commented 2 years ago

@exaucae would you mind sharing any update? I'm having this issue as well

exaucae commented 2 years ago

@andirkh ,thanks for the bump. Pull request is #100. Feedbacks welcomed!