a1b10 / cl-xlsx

📜 Read XLSX files with Common Lisp
23 stars 2 forks source link

empty xlsx cells not read #11

Closed slyrus closed 4 years ago

slyrus commented 4 years ago

If I have an xlsx file with empty cells, they are not read. So if I have

     A        B       C
1  foo                bar

I expect to get ("foo" nil "bar") instead I get ("foo" "bar").

gwangjinkim commented 4 years ago

Thanks so much! Yes, I would also expect that 'nil' would be given. I was recently thinking to come back to this package in nearby days. Just so much at work at the moment. And I was interested in Julia. Did you looked at Julia? It is quite lispy. I thought it runs on lisp. But just the parser is femtolisp and it processes the syntax down for LLVM. So can one say Julia is running on Lisp underneath or not? :D (I think in speed, CL will surpass Julia by far - still - with SBCL. Actually I would love to see the datascience and bioinformatics community to rediscover Lisp ... R as well as Julia are in my view lisps - put a layer for syntax on top to please the C/Python community but worsening its lisp character by that ... bad compromises ... However, CL community can learn from Julia's Flux package for machine learning e.g. - it takes mathematic formulas and takes the derivatives of them. For CL this with maxima sth like that would be easy - I loudly thinking ...)

I am thinking of to put a flag into the read-xlsx function - so that it will use either your code or the stream-reader (my previous code). Because for huge files the stream version will be better - but then I have to see how to integrate all your fixes until now.

Thank you anyway for the using&testing (&fixing so far!). - I will add you to authors list here. And let's create some useful xlsx possibility for the CL community! ...

On Fri, Jan 31, 2020 at 7:20 AM Cyrus Harmon notifications@github.com wrote:

If I have an xlsx file with empty cells, they are not read. So if I have

 A        B       C

1 foo bar

I expect to get ("foo" nil "bar") instead I get ("foo" "bar").

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/a1b10/cl-xlsx/issues/11?email_source=notifications&email_token=AHYOWF6QDPMIDYHV2PQT6GDRAO7KTA5CNFSM4KOCCBDKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IKBO2LQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHYOWF2IGYSM4RDIWBMJKDTRAO7KTANCNFSM4KOCCBDA .

slyrus commented 4 years ago

I don't understand the advantage of the streaming approach, at least as it concerns memory. If cl-xlsx allowed user-supplied stream processing functions, sure, I could see how calling a users function for each row -- allowing the user to do something with each row while avoiding bringing the entire file into memory at one time -- might be nice, but the current API doesn't allow that, does it? Instead of a flag, I'd rather see a new interface that supports user supplied table/sheet/cell/row processing and scrap/ignore the streaming approach for read-xlsx.

Do you really have very large xlsx files? How big? RAM is cheap these days. I definitely think a better approach to reading parts of an xlsx file (e.g. single sheets) is warranted, but I'm more skeptical about the value of a row/cell based streaming API. And if we did have that we'd certainly want to make sure that 1) user-supplied table/sheet/row/column/cell processing code could be supplied (as I said above), and 2) the name/locations of each item were passed to the processing code so that the code could do the right thing (which is, in a way, the problem that this issue is addressing - we were silently ignoring empty cells before).

I guess it might be nice to extend the existing API to return either an array or a list. That's easy enough to do, and I might work on that, along with the ability to just read single sheets.

As for Julia, R, etc... Yes, I use R quite a lot. The tidyverse, ggplot2, and the whole Rmarkdown world are indispensable tools for me -- even if I would actually like to get rid of them and replace them with CL equivalents. Haven't used Julia. Looks interesting, but not interesting enough to make me give up CL.