datasets / awesome-data

Curated list of quality open datasets
https://datahub.io/collections
788 stars 101 forks source link

Unicode Codes #13

Open rufuspollock opened 11 years ago

rufuspollock commented 11 years ago

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

hirntodt commented 9 years ago

I can package this one.The data is already in csv format.

i will research what's actually in there and get back for discussion of what we want to include etc.

rufuspollock commented 8 years ago

@hirntodt how are you doing here? It would be great to get this packaged. /cc @pdehaye

zelima commented 8 years ago

@rgrp I can package this one, just not really get what column names should be..

rufuspollock commented 8 years ago

@zelima great - please move ahead. Can you copy a small sample of the data here for us to look at. @hirntodt suggested this was already in CSV format so we may not need to do much.

zelima commented 8 years ago

@rgrp yes it is.. Just it does not have header.

sample

0056;LATIN CAPITAL LETTER V;Lu;0;L;;;;;N;;;;0076;
0057;LATIN CAPITAL LETTER W;Lu;0;L;;;;;N;;;;0077;
0058;LATIN CAPITAL LETTER X;Lu;0;L;;;;;N;;;;0078;
0059;LATIN CAPITAL LETTER Y;Lu;0;L;;;;;N;;;;0079;
005A;LATIN CAPITAL LETTER Z;Lu;0;L;;;;;N;;;;007A;
005B;LEFT SQUARE BRACKET;Ps;0;ON;;;;;Y;OPENING SQUARE BRACKET;;;;
005C;REVERSE SOLIDUS;Po;0;ON;;;;;N;BACKSLASH;;;;
005D;RIGHT SQUARE BRACKET;Pe;0;ON;;;;;Y;CLOSING SQUARE BRACKET;;;;
005E;CIRCUMFLEX ACCENT;Sk;0;ON;;;;;N;SPACING CIRCUMFLEX;;;;
005F;LOW LINE;Pc;0;ON;;;;;N;SPACING UNDERSCORE;;;;
0060;GRAVE ACCENT;Sk;0;ON;;;;;N;SPACING GRAVE;;;;
rufuspollock commented 8 years ago

@zelima can you suggest headers based on your inspection.

zelima commented 8 years ago

To be honest I don't know, but According to - some of ftp://ftp.unicode.org/Public/UNIDATA/ files - They might be: Unicode,Schematic Name,General_Category property value,,,CJK Radical,Code Point Sequence for USI,,,ast_Asian_Width property,The formal name aliases,,,mapping,

abiola-adeoye commented 1 year ago

@anuveyatsu I'd like to take this issue as a prerequisite to applying for the DE role

rufuspollock commented 4 months ago

@Mikanebu did this get completed?

solo11 commented 2 weeks ago

@rufuspollock @anuveyatsu I can help here if this issue has not already been worked on, I have a fair idea about where to look for the headers, documentation and references.

anuveyatsu commented 2 weeks ago

hi @solo11 it would be great if you could work on it - thanks!

solo11 commented 2 weeks ago

@anuveyatsu This is what I came up with - https://github.com/solo11/Unicode-Codes

anuveyatsu commented 2 weeks ago

hi @solo11 thanks for this - great stuff! Could you open PR against this repository so that I can properly review it please? https://github.com/datasets/unicode-characters

solo11 commented 1 week ago

@anuveyatsu Sure, the repo is empty though I cannot open a PR

anuveyatsu commented 1 week ago

@solo11 it is a new repo so yes it's empty. Could you fork that repo and push your commits into a fork? Then you could open a PR.

solo11 commented 1 week ago

Fork will not work without any files, can you add a README to the repo? Refer to this https://github.com/expressjs/.github/issues/1

On Wed, Oct 16, 2024 at 6:59 AM Anuar Ustayev (aka Anu) < @.***> wrote:

@solo11 https://github.com/solo11 it is a new repo so yes it's empty. Could you fork that repo and push your commits into a fork? Then you could open a PR.

— Reply to this email directly, view it on GitHub https://github.com/datasets/awesome-data/issues/13#issuecomment-2416469686, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXVILCJRRNOXHUYCVGWYCTZ3ZBK7AVCNFSM6AAAAABI6ZGHMKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJWGQ3DSNRYGY . You are receiving this because you were mentioned.Message ID: @.***>

anuveyatsu commented 1 week ago

@solo11 fixed :+1: