Open alfablend opened 5 months ago
Any chance you can copy+paste the request headers for the site you are trying? i need more exact info
Load the URL in chrome and hit up the inspection > network tab
Thanks for your answer!
When I try to open link to CSV file in Chrome it automaticly download a file with .csv extension. Chrome window stay blank.
So network tab, as far as I understand, is blank too.
I use plain parser in changedetection.io to work with CSV files, Chrome mode is not working with these files.
@alfablend use curl from command line instead
$ curl --head https://changedetection.io/CHANGELOG.txt
HTTP/2 200
server: nginx
date: Tue, 26 Mar 2024 15:13:13 GMT
content-type: text/plain
content-length: 86815
last-modified: Tue, 26 Mar 2024 15:01:02 GMT
vary: Accept-Encoding
etag: "6602e32e-1531f"
strict-transport-security: max-age=63072000
accept-ranges: bytes
try that
Thanks, done it (changed the link in your command to my link first).
As I see, there is UTF-8 charset in this response. But it is not similar as downloadable CSV file itself encoding, that is windows-1251. May be is there any way to force using windows-1251 charset?
it seems the server is returning the wrong information, your CSV is reported as "text/html"
can you attach the CSV file?
Thank you for explanation! Thats the file urvi (1).csv
import chardet
def detect_encoding(file_path):
with open(file_path, 'rb') as f:
rawdata = f.read()
result = chardet.detect(rawdata)
return result
result = detect_encoding("urvi.1.csv")
print("The encoding of the file is:", result['encoding'])
print("Confidence level:", result['confidence'])
$ python3 ./test.py
The encoding of the file is: windows-1251
Confidence level: 0.9414230748073508
so the file is windows-1251
but the web server is reporting the wrong encoding type
i'm also not sure if windows-1251 is supported by any of our text difference handlers, more than likely not...
Thank you! If I understand you right, there is general problem with non-unicode (non-latin) content. And the solution may be finding preprocessor (charset converter).
Due the Wikipedia windows-1251 charset is still "the second most-used single-byte character encoding (or third most-used character encoding overall)". But, of course,it is still small percents in the scale of global internet, and, I understand, it may be not the priority task.
Thank you! If I understand you right, there is general problem with non-unicode (non-latin) content. And the solution may be finding preprocessor (charset converter).
the software already has the chardet
detection library installed :) so first is to write some tests and understand the relationship between the windows encoding type, and websites that return the wrong mime type
Version and OS 0.45.16 on windows 11/docker
Is your feature request related to a problem? Please describe. CSV files (tables in plain text format) charset encoding selection is unsupported so content of these files may be unreadable . Changedetection as far as I understand use UTF-8 charset for these files. The CSV files that i need to monitor are in windows-1251 charset.
Describe the solution you'd like I need to have an opportunity to select correct chatset. My CSV files are encoded in windows-1251 charset.
Describe the use-case and give concrete real-world examples
There are a lot of big data in CSV format. It is text format that represents the data tables by using commas or other symbols. You can see more about it on Wikipedia: https://en.wikipedia.org/wiki/Comma-separated_values As text files, CSV may be encoded non in UTF-8. For example, in can have windows-1251 or koi8-r charset. CSV files that I try to use with changedetection app are unreadable due absense of charset selection .