gigascience / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
9 stars 15 forks source link

Replace Unicode Character #276

Open ChrisArmit opened 5 years ago

ChrisArmit commented 5 years ago

To ensure that README files display properly, we need to make the following changes to the TITLE and the DESCRIPTION in GigaDB.

  1. Replace Unicode Character LEFT SINGLE QUOTATION MARK [‘] with APOSTROPHE [']

  2. Replace Unicode Character RIGHT SINGLE QUOTATION MARK [’] with APOSTROPHE [']

Example: To see an example of APOSTROPHE ['] displaying correctly find the word [insect's] at the following URL:

ftp://parrot.genomics.cn/gigadb/pub/10.5524/100001_101000/100560/readme_100560.txt

jessesiu commented 5 years ago

Do you mean replace those characters in the database title and description?

kencho51 commented 4 years ago

@ChrisArmit Would you please further clarify your request, like which TITLE and DESCRIPTION?

only1chunts commented 4 years ago

Hi @kencho51 , This is a generic issue that may or may-not be be present in multiple datasets, so it will require a search to find and replace. The issue can be illustrated by the attached readme file that I just generated from a test dataset. The webpage presents the left and right quotation marks absolutely fine, the issue is how these are displayed in the plain text readme files that we generate from the database , the attached file was generated from the above dataset using the usual script and it shows the issue with the quotation marks. readme_200051.txt

There are similar issues with certain other non-ascii text, it would be good to identify those and ensure they are correctly represented. Hope that helps clarify?

pli888 commented 4 years ago

@only1chunts Can you tell us where the script is that creates the readme files?

ChrisArmit commented 4 years ago

Hi @pli888

The filepath of the readme.sh script that we use is /tmp/readme.sh

Cheers Chris A

only1chunts commented 4 years ago

its on the parrot server (192.168.44.247) which is also where we run it (in the individual DOI folders).

The actual script looks like this (I removed jesse's password):


#!/bin/bash
# author:Jesse
pwd=$(pwd)
filename=$(basename $pwd)
wget -O "./$filename.xml" "http://gigadb.org/api/readme?doi=$filename&username=jesse@gigasciencejournal.com&password=********"
python /tmp/readxml.py ./$filename.xml ./readme_$filename.txt
rm -rf ./$filename.xml
pli888 commented 4 years ago

python /tmp/readxml.py ./$filename.xml ./readme_$filename.txt

@only1chunts @ChrisArmit Your readme files are created by a Python program! Also, having these scripts stored in the \tmp directory isn't such a great idea as it's for temporary storage of files which the operating system or a sysadmin can delete at any time and no one would know about it.

only1chunts commented 4 years ago

yes its a hack, but it does the job for now. It has disappeared with no trace once, but we have a copy so we just replaced it. NB- There is another script that we use in the same place: md5sum.sh it just generates the md5 checksums for all files in the directory and its subdirectories.