defunkydrummer / lisp-xl

Common Lisp Microsoft XLSX (Microsoft Excel) loader for arbitrarily-sized / big-size files
MIT License
29 stars 6 forks source link

Empty Unique strings #2

Open lisp-is-the-future opened 5 years ago

lisp-is-the-future commented 5 years ago

Dear Flavio,

searching for a way to read-in .xlsx files using common lisp, since I really want to try to use it in every day life (I am a Bioinformatics Postdoc in Germany), I encountered some problems.

I want to read-in the bigger file which was created using R's openxlsx package. And as a test, I created a dummy small file, which contains actually only the table

A | B | C 1 | 2 | 3 4 | 5 | 6

Reading my xlsx files (test an desired file) in lisp-xl both resulted in empty strings. The :xlsx package could handle the small test, but not the big, desired, file. I could also send you the xlsx files if required.

;; I tried the following using ;; SBCL 1.3.1.debian ;; in emacs/SLIME:

;; I created the small file using libreoffice ;; LibreOffice 5.1.6.2 10m0(Build:2) ;; in ubuntu 16.04

;; and the big file using R openxlsx package (the current one) ;; in ubuntu 16.04

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;; ;; file paths ;;;;;;;;;;;;;;;;;;;;;;;;

(defparameter small-fpath #P"/home/josephus/test/xlsx-files/test.xlsx") (defparameter big-fpath #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx")

;;;;;;;;;;;;;;;;;;;;;;;; ;; try the small file ;;;;;;;;;;;;;;;;;;;;;;;;

(ql:quickload :xlsx) (xlsx:list-sheets small-fpath)

;; => ((1 "Sheet1" "worksheets/sheet1.xml"))

(xlsx:read-sheet small-fpath 1)

;; => (((:A . 1) . "A") ((:B . 1) . "B") ((:C . 1) . "C") ((:A . 2) . 1) ;; ((:B . 2) . 2) ((:C . 2) . 3) ((:A . 3) . 4) ((:B . 3) . 5) ((:C . 3) . 6))

(xlsx:as-matrix (xlsx:read-sheet small-fpath 1))

;; => #2A(("A" "B" "C") (1 2 3) (4 5 6)), (:A :B :C), (1 2 3)

;; the quicklisp package is that from Carlos Ungil ;; add from Akihide Nano's site: (defun as-alist (xlsx) "Creates an a-list from a list of cells of the form ((:A1 . 42) (:B1 . 21))" (mapcar #'(lambda (lst) (cons (intern (concatenate 'string (symbol-name (caar lst)) (write-to-string (cdar lst))) :keyword) (cdr lst))) xlsx))

(defun as-plist (xlsx) "Creates a p-list from a list of cells of the form (:A1 42 :B1 21)" (mapcan #'(lambda (lst) (cons (intern (concatenate 'string (symbol-name (caar lst)) (write-to-string (cdar lst))) :keyword) (list (cdr lst)))) xlsx))

(as-alist (xlsx:read-sheet small-fpath 1))

;; => ((:A1 . "A") (:B1 . "B") (:C1 . "C") (:A2 . 1) (:B2 . 2) (:C2 . 3) (:A3 . 4) ;; (:B3 . 5) (:C3 . 6))

(as-plist (xlsx:read-sheet small-fpath 1))

;; => (:A1 "A" :B1 "B" :C1 "C" :A2 1 :B2 2 :C2 3 :A3 4 :B3 5 :C3 6)

;;;;;;;;;;;;;;;;;;;;;;; ;; now try the big file ;;;;;;;;;;;;;;;;;;;;;;;

(xlsx:read-sheet big-fpath 1)

;; Node does not have a single string child: #S(XMLS:NODE ;; :NAME t ;; :NS http://schemas.openxmlformats.org/spreadsheetml/2006/main ;; :ATTRS ((space ;; preserve)) ;; :CHILDREN NIL) ;; [Condition of type SIMPLE-ERROR]

;; Restarts: ;; 0: [RETRY] Retry SLIME interactive evaluation request. ;; 1: [*ABORT] Return to SLIME's top level. ;; 2: [ABORT] abort thread (#<THREAD "worker" RUNNING {1006CA5C33}>)

;; Backtrace: ;; 0: (XMLS:XMLREP-STRING-CHILD #S(XMLS:NODE ;; :NAME "t" ;; :NS "http://schemas.openxmlformats.org/spreadsheetml/2006/main" ;; :ATTRS (("space" "preserve")) :CHILDREN NIL) :ERROR) ;; Locals: ;; IF-UNFOUND = :ERROR ;; TREENODE = #S(XMLS:NODE ;; :NAME "t" ;; :NS "http://schemas.openxmlformats.org/spreadsheetml/2006/main" ;; :ATTRS (("space" "preserve")) :CHILDREN NIL) ;; 1: (XLSX::GET-UNIQUE-STRINGS #S(ZIP:ZIPFILE ;; :STREAM #<SB-SYS:FD-STREAM for ;; "file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" ;; {1003FBF2B3}> ;; :ENTRIES #<HASH-TABLE ;; :TEST EQUAL ;; :COUNT 12 {.. ;; Locals: ;; SB-DEBUG::ARG-0 = #S(ZIP:ZIPFILE ;; :STREAM #<SB-SYS:FD-STREAM for "file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" ;; {1003FBF2B3}> ;; :ENTRIES #<HASH-TABLE ;; :TEST EQUAL ;; :COUNT 12 {10088E0403}>) ;; 2: (XLSX:READ-SHEET #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" 1) ;; Locals: ;; FILE = #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" ;; SHEET = 1 ;; 3: (SB-INT:SIMPLE-EVAL-IN-LEXENV (XLSX:READ-SHEET BIG-FPATH 1) #) ;; Locals: ;; SB-DEBUG::ARG-0 = (XLSX:READ-SHEET BIG-FPATH 1) ;; SB-DEBUG::ARG-1 = # ;; 4: (EVAL (XLSX:READ-SHEET BIG-FPATH 1)) ;; Locals: ;; SB-DEBUG::ARG-0 = (XLSX:READ-SHEET BIG-FPATH 1) ;; 5: ((LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL)) ;; [No Locals] ;; --more--

;; so :xlsx package cannot handle the big file

;; so it was not possible to open the big file using the :xlsx package

(ql:quickload :lisp-xl) To load "lisp-xl": Load 1 ASDF system: lisp-xl ; Loading "lisp-xl" ;;; Checking for wide character support... WARNING: Lisp implementation doesn't use UTF-16, but accepts surrogate code points. yes, using code points. .. ;;; Checking for wide character support... WARNING: Lisp implementation doesn't use UTF-16, but accepts surrogate code points. yes, using code points. ;;; Building Closure with CHARACTER RUNES .....

(defparameter t1 (lisp-xl:read-sheet small-fpath 1))

;; Reading file /home/josephus/test/xlsx-files/test.xlsx ;; Loading metadata into RAM...3 unique strings found. ;; Uncompressing to File [/tmp/lisp-xl-tempGHU3ALSW.tmp] ...

(print t1)

;; #S(LISP-XL::SHEET ;; :UNIQUE-STRINGS #(" " " " " ") ;; the unique strings are empty ;; :NUMBER-FORMATS NIL ;; :DATE-FORMATS NIL ;; :FILE-NAME #P"/tmp/lisp-xl-tempGHU3ALSW.tmp" ;; :LAST-STREAM-POSITION NIL)

(defparameter t2 (lisp-xl:read-sheet big-fpath 1))

;; Reading file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx ;; Loading metadata into RAM...45814 unique strings found. ;; Uncompressing to File [/tmp/lisp-xl-tempAAURSO2.tmp] ...

(print t2)

;; #S(LISP-XL::SHEET ;; :UNIQUE-STRINGS #(" " " " " " " " " " " " " " " " " " " " " " " " " " " " ;; " " " " " " " " " " " " " " " " " " " " " " " " " " " "

;; [ ... may left out ...]

;; " " " " " " " " " " " " " " " " " " " " " " " " " " " " ;; " " " " " " " " " " " ") ;; :NUMBER-FORMATS NIL ;; :DATE-FORMATS NIL ;; :FILE-NAME #P"/tmp/lisp-xl-tempAAURSO2.tmp" ;; :LAST-STREAM-POSITION NIL)

(lisp-xl-csv:excel-to-csv small-fpath "/home/josephus/test/xlsx-files/test.csv" 1)

;; gives

, , ,, ,,

;; as output in test.csv ;; so strings are really empty

;; so somehow no strings are recognized ... ;; I created the small file using libreoffice ;; LibreOffice 5.1.6.2 10m0(Build:2) ;; in ubuntu 16.04

;; and the big file using R openxlsx package (the current one) ;; in ubuntu 16.04

;; would you have a clue, what the problem could be? ;; the desired big file has 45809 rows and 6 columns.

defunkydrummer commented 5 years ago

Hi @lisp-is-the-future , do you think you could upload the XLSX file with the problem? or at least a partial version of that file?

I'm sorry not to answer before: i can't remember which of my personal mail account is used by GitHub is for notifying.

lisp-is-the-future commented 5 years ago

Dear @defunkydrummer,

sorry, haven't checked this email address since a while.

Sure, I send you both files.

On Tue, Oct 30, 2018 at 8:25 PM defunkydrummer notifications@github.com wrote:

Hi @lisp-is-the-future https://github.com/lisp-is-the-future , do you think you could upload the XLSX file with the problem? or at least a partial version of that file?

I'm sorry not to answer before: i can't remember which of my personal mail account is used by GitHub is for notifying.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/defunkydrummer/lisp-xl/issues/2#issuecomment-434434987, or mute the thread https://github.com/notifications/unsubscribe-auth/AhaEnlIwQCLwxlIICPJYSWyn8njZ_eYZks5uqKebgaJpZM4WX8p- .

defunkydrummer commented 5 years ago

@lisp-is-the-future can you send me the files again? I'm revisiting this.

If you want you can send to my hotmail account, the address is on the readme.

defunkydrummer commented 5 years ago

@lisp-is-the-future i have commited a new version, btw, with a bug fix.

lisp-is-the-future commented 4 years ago

@defunkydrummer sorry for long not answering. I tried it again. But this time I have a smilar but slightly different error:


(ql:quickload :lisp-xl)

(defparameter *f* "/home/josephus/Dropbox/amit_scripts/RawCounts.xlsx")

(defparameter *c* (lisp-xl:read-sheet *f* 1))

#|
The value
  #S(LISP-XL::SHEET
     :UNIQUE-STRINGS #(#1=" " #1# #1# #1# #1# #1# #1# #1# #1#
                       #1# #1# #1# #1# #1# #1# #1# #1# #1# #1#
                       #1# #1# #1# #1# #1# #1# #1# #1# #1# #1#
                       ...
                       #1# #1# #1# #1# #1# #1# #1#..
   [Condition of type TYPE-ERROR]

Restarts:
 0: [RETRY] Retry SLIME interactive evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [ABORT] abort thread (#<THREAD "worker" RUNNING {100EBCC2E3}>)

Backtrace:
  0: (LENGTH #S(LISP-XL::SHEET :UNIQUE-STRINGS #(#1=" " #1# #1# #1# #1# #1# ...) :NUMBER-FORMATS NIL :DATE-FORMATS NIL :FILE-NAME #P"/tmp/lisp-xl-tempSVDN4TBO.tmp" :LAST-STREAM-POSITION NIL))
  1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (LENGTH *C*) #<NULL-LEXENV>)
  2: (EVAL (LENGTH *C*))
  3: ((LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL))
  4: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME interactive evaluation request." #<CLOSURE (LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL) {100E84E44B}>)
  5: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL) {100E84E42B}>)
  6: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:INTERACTIVE-EVAL "(length *c*)") #<NULL-LEXENV>)
  7: (EVAL (SWANK:INTERACTIVE-EVAL "(length *c*)"))
  8: (SWANK:EVAL-FOR-EMACS (SWANK:INTERACTIVE-EVAL "(length *c*)") "COMMON-LISP-USER" 205)
  9: ((LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD))
 10: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<FUNCTION (LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD) {2274CD7B}>)
 11: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "/home/josephus/.emacs.d/elpa/slime-20180413.1720/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<FUNCTION (LAMBDA NIL :IN SWANK::SPAWN-WORKE..
 12: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {1008F6B5F3}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {1009053713}>) (*TRACE-OUTPUT* . #2#) (*ERR..
 13: ((LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD))
 14: ((FLET SB-UNIX::BODY :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 15: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 16: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 17: ((FLET "WITHOUT-INTERRUPTS-BODY-1" :IN SB-THREAD::CALL-WITH-MUTEX))
 18: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {7FFFE0F86D4B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THR..
 19: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "worker" RUNNING {100EBCC2E3}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD) {100EBCC28B}> NIL)
 20: ("foreign function: call_into_lisp")
 21: ("foreign function: new_thread_trampoline")
|#
gwangjinkim commented 4 years ago

I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem.

lisp-is-the-future commented 4 years ago

based on :cxml which uses streams (then using klacks inside :cxml) I wrote in the recent days my own package for xlsx and ods file reading.

https://github.com/a1b10/cl-xlsx/blob/master/README.md

On Wed, Aug 14, 2019 at 11:25 AM Gwang-Jin Kim notifications@github.com wrote:

I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/defunkydrummer/lisp-xl/issues/2?email_source=notifications&email_token=AILIJHXT72BHE5NTS2TTOTLQEPFSDA5CNFSM4FS7ZJ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4IHKKQ#issuecomment-521172266, or mute the thread https://github.com/notifications/unsubscribe-auth/AILIJHT2LC3BT5QNRYBQOHDQEPFSDANCNFSM4FS7ZJ7A .

lisp-is-the-future commented 4 years ago

I need this for my bioinformatics stuff, because in bioinformatics a lot of xlsx table files are used - because the biologists doing the experiments rely a lot on MS-Office products. I want to use more lisp at workplace.

On Sat, Aug 24, 2019 at 12:16 PM J Kim lisp.is.the.future@gmail.com wrote:

based on :cxml which uses streams (then using klacks inside :cxml) I wrote in the recent days my own package for xlsx and ods file reading.

https://github.com/a1b10/cl-xlsx/blob/master/README.md

On Wed, Aug 14, 2019 at 11:25 AM Gwang-Jin Kim notifications@github.com wrote:

I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/defunkydrummer/lisp-xl/issues/2?email_source=notifications&email_token=AILIJHXT72BHE5NTS2TTOTLQEPFSDA5CNFSM4FS7ZJ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4IHKKQ#issuecomment-521172266, or mute the thread https://github.com/notifications/unsubscribe-auth/AILIJHT2LC3BT5QNRYBQOHDQEPFSDANCNFSM4FS7ZJ7A .

defunkydrummer commented 3 years ago

based on :cxml which uses streams (then using klacks inside :cxml) I wrote in the recent days my own package for xlsx and ods file reading. https://github.com/a1b10/cl-xlsx/blob/master/README.md On Wed, Aug 14, 2019 at 11:25 AM Gwang-Jin Kim @.***> wrote: I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AILIJHXT72BHE5NTS2TTOTLQEPFSDA5CNFSM4FS7ZJ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4IHKKQ#issuecomment-521172266>, or mute the thread https://github.com/notifications/unsubscribe-auth/AILIJHT2LC3BT5QNRYBQOHDQEPFSDANCNFSM4FS7ZJ7A .

Hey, thanks for this!! This will benefit the Lisp ecosystem.

I see your lib loads the whole file into RAM which was what i tried to avoid, because we used very big files, often more than 100MB.

Be sure to add your lib to the CLIKI wiki website!

gwangjinkim commented 3 years ago

Hey! Welcome! Yes, at the moment - but current version - doesn't use :cxml - thus no streams - thus loads everything into RAM. There was a bug and @slyrus fixed the bug by rewriting it in that way with some other dependencies (e.g. :FXML). I wanted however re-introduce the stream usage again. Thank you for your hint to add it to CLKI wiki website. I'll do it as soon as I can!