Closed apjanke closed 5 months ago
I tried in install Tablicious to help spotting shadowing functions but...
On Octave 7.2 (Ubuntu 20.04LTS) pkg install -forge tablicious
produces an error
>> pkg install -forge tablicious
warning: function /home/promitheas/.local/share/octave/api-v57/packages/tablicious-0.3.6/endsWith.m shadows acore library function
warning: called from
doc_cache_create>gen_doc_cache_in_dir at line 146 column 5
doc_cache_create at line 62 column 12
install>generate_lookfor_cache at line 840 column 5
install at line 241 column 7
pkg at line 612 column 9
warning: function /home/promitheas/.local/share/octave/api-v57/packages/tablicious-0.3.6/startsWith.m shadowsa core library function
warning: called from
doc_cache_create>gen_doc_cache_in_dir at line 146 column 5
doc_cache_create at line 62 column 12
install>generate_lookfor_cache at line 840 column 5
install at line 241 column 7
pkg at line 612 column 9
error: '__event_manager_register_doc__' undefined near line 41, column 7
error: called from
load_tablicious at line 41 column 7
/home/promitheas/.local/share/octave/api-v57/packages/tablicious-0.3.6/PKG_ADD at line 3 column 1
doc_cache_create>gen_doc_cache_in_dir at line 146 column 5
doc_cache_create at line 62 column 12
install>generate_lookfor_cache at line 840 column 5
install at line 241 column 7
pkg at line 612 column 9
Octave 8.2 produces that same error as well
I removed the fillmissing, rmmissing, and standardizeMissing functions (here), and their specialization methods from table
(here). I don't see them left in any other classes.
Added an istable
function (here).
Having trouble regenerating the doco after that. Not sure if I'm just out of practice, or things have changed in the macOS build environment since last time.
I have a few questions.
/docs
folder for building online documentation with the pkg-octave-doc
package? I can make a fork and work on this. Perhaps, we could relocate the files currently in this folder to another like /documentation
. BTW, the current online documentation appears unreachable/broken.pkg-octave-doc
.@deftypefn
instances from e.g.
## @deftypefn {Function} {@var{out} =} istable (@var{x})
to
## @deftypefn {Tablicious} {@var{out} =} istable (@var{x})
so that when users call help
it is apparent that the function is part of the Tablicious package? I know that this is minor thing that I have previously adapted in the statistics package, but it helps a lot when functions shadow core functions or functions from other packages.
Added an
istable
function (here).
istable
is already a method in the table
classdef. Of course, if the input variable is not of class table
, then this method cannot be called. So I think it kind of misses the scope of existing. But istable
is not of much use inside the Tablicious package either. I noticed the Devs note in below the help text, but IMHO it is not a very good idea to duplicate this function acrros multiple packages because we will end up with a labyrinth of istable
functions shadowing one another, depending on the sequence of loaded packages. The most robust way is to check with isa (x, "table")
whenever we want to add table
support to functions in other packages (this is my approach in statistics anyway).
I noticed that there are several functions in /inst
folder which are also available as methods in the table
classdef file. Why is this so? It feels kind of a duplication and I can't figure out the benefits for this. Am I missing something? (in terms of functionality)
I have a few questions.
- How can I install the latest dev from the bulbasaur-24 branch into Octave?
I don't know if Octave's pkg
can install from git repo branches, and I'm not building "head of branch" tarballs to install from. But you mostly don't need to do a full package installation: the only compiled oct/mex file in Tablicious is for time zone conversions inside datetime
, so if you're not using that, you can run it directly from the repo. Clone the Tablicious git repo to your local computer, check out the bulbasaur-24 branch there, and use addpath
to add its inst/
directory to your Octave path. That's how I mostly run it myself.
- Would you like to to reassign the /docs folder for building online documentation with the pkg-octave-doc package?
Maybe! The pkg-octave-doc package is new to me, and looks useful. I'll check it out. I'm currently generating the documentation with some Perl scripts that I copy-pasted from another Octave package a few years back, and using them is a little rough. (E.g., make doc
is currently broken in my build environment and I'm not sure why; that's why some of the online and generated doco is out of date here.)
I'd rather do this myself than take a PR for it. I like to be pretty hands-on with the tooling for my packages. I'll have a look at it soon.
Perhaps, we could relocate the files currently in this folder to another like /documentation
It lives in /docs
now because that supports publishing to GitHub Pages from the master branch, instead of having to maintain a separate branch or repo for the GH Pages site, which is more work. This is just a convenience; I've come around to the view that what's published on GH Pages or the other online site should be the doco for the latest stable release, or versioned doco for each release. Doing it this way from master
only publishes the doco for the latest dev release, and doing versioned doco within the repo adds clutter to both the repo files and the git commit history. If Tablicious gets popular, I'll probably break the documentation site out to a separate repo, at which point the /docs
folder could be moved within this repo.
BTW, the current online documentation appears unreachable/broken.
Oh, thanks for catching that! Was a couple bad links in the main repo README and the doco index, plus GH Pages configuration. I fixed it (I think) and it's mostly working for me now. May take an hour or two for DNS changes to propagate so it works for everyone.
I think we would benefit by adding BISTs and DEMOs in the function files.
Oh, yes. Should definitely have tests and demos. Just haven't found the time to do those yet.
Do you agree to change the @deftypefn instances from [{Function} to {Tablicious} ...] so that when users call help it is apparent that the function is part of the Tablicious package?
Oh, hmm, I'd never seen that before. I'll have a look at how you did it in Statistics. Making it clear that these functions are part of an OF package instead of core Octave seems like a good idea.
Here's a separate issue for these documentation-related things: https://github.com/apjanke/octave-tablicious/issues/110
So I think it kind of misses the scope of [an istable function] existing.
So, the M-code doco doesn't come right out and say this, but I think there are two reasons for istable()
to exist as a function and not just a method.
table
or tabular interface in a duck-typing sense, even if they don't subclass @table
, and can be coded against as such.isblah(x)
instead of isa(x, 'blah')
, like with isstring()
or isdouble()
. Maybe because they're scared of OOP.The first thing – declaring duck-typing-style interface compatibility with table
– lets either the core language or users to define new classes which work like tables, and can be supported by functions that accept table array inputs, but don't inherit from @table
itself as subclasses of it. For example, one could define a table-like class that is a remote proxy for a SQL table on a database server, or a Tall Array style datasource for some HDF5 or Parquet files on disk, or just an alternate implementation of an in-memory table array because you have special storage needs, or a different idea for implementing joins, or just think you can do it better than I did.
In these cases, if functions (like in Statistics, or other packages) that had optional table support in terms of the arguments they accept were to check for table-ness with istable
like this, instead of isa(x, 'table')
in the same place:
function blah(x)
% Works on numerics or tables
if istable(x)
% ... do table-y stuff ...
else
% ... do number-y stuff ...
end
end
...then those functions could work with alternate table-like implementations (concrete classes) without having to know about them in advance.
Now, I doubt anyone using Octave is going to actually want to do that any time soon. That's not a huge motivation for Tablicious per se, just my view of why istable()
exists in Matlab in the first place.
The main reason I want istable()
in Tablicious now is for Matlab compatibility, so users with existing M-code scripts and functions that are already calling istable()
can bring them over to Octave + Tablicious and have them work without having to make source code modifications to convert all the istable(x)
calls to isa(x, 'table')
.
it is not a very good idea to duplicate this function acrros multiple packages because we will end up with a labyrinth of istable functions shadowing one another, depending on the sequence of loaded packages.
Yep, it's a mess. Just the best I could come up with here. As long as all the istable.m
implementations are almost exactly the same, it'll work fine and doesn't matter who shadows who. And istable
seems simple enough that that may be feasible, but who knows.
If core Octave provided an istable()
function that just always returned false, and then classdefs could override it with methods, this would work fine without packages having to supply an istable.m
function definition. But currently, calling istable(x)
raises an "istable undefined" error instead of returning false (if x
is not a class that defines an istable method).
The "clean" modular way to do this would be to separate istable
and other stuff that defines just the interface for tables and the means of detecting them out to a minimal "Service Provider Interfave" package, named like TableApi. Then Tablicious, Statistics, and any other package that wanted to support tables but not have a hard dependency on them could take a non-optional dependency on TableApi, get access to a single consistent definition of istable()
, and then have an optional dependency on or compatibility with Tablicious, and client code could install Tablicious (or another implementation provider package) if they wanted to actually use tables in their code. But that approach is kinda complex and software-engineering-y, and I bet Octave folks would hate it.
The most robust way is to check with isa (x, "table") whenever we want to add table support to functions in other packages
Yep, that should work fine. Statistics can ignore the existence of istable
entirely if it wants, not provide its own istable.m
, and just do isa(x, 'table')
everywhere.
I noticed that there are several functions in /inst folder which are also available as methods in the table classdef file. Why is this so? It feels kind of a duplication and I can't figure out the benefits for this. Am I missing something?
The methods provide class-specific specializations of those functions, and functions provide more general (and possibly degenerate) implementations that work on inputs other than those classes which define methods of the same name. It's the general pattern of having functions which are specialized by method overrides. And it's also possible that in some cases I forgot about one or the other implementation, and there's duplication which can be removed.
Any particular occurrences of that which are bothering you, or you're wondering about?
I made a personal repository of Tablicous and will try to build online documentation with pkg-octave-doc
so we can see how it looks and if you like it then we can merge the changes back to yours. I am only working on the table
classdef file to add functionality for tbl = table ('Size', sz, 'VariableTypes', varTypes)
which is currently not supported. Add BISTs and DEMOs, and then build the online documentation for this to see how it will come out.
Say, @pr0m1th3as, I think I'm going to rename this repo's master
branch to main
, bringing it in line with current prevalent Git and GitHub usage. Would that cause you any problems if I did so today? Would only affect you if you have existing clones or forks of the repo.
...will try to build online documentation with pkg-octave-doc... working on the table classdef file to add functionality for tbl = table ('Size', sz, 'VariableTypes', varTypes)
Cool; both those would be useful!
Say, @pr0m1th3as, I think I'm going to rename this repo's
master
branch tomain
, bringing it in line with current prevalent Git and GitHub usage. Would that cause you any problems if I did so today? Would only affect you if you have existing clones or forks of the repo.
No, its ok. I will make a new clone as needed and merge any changes locally before making a PR to your repo. As said above, I am only making changes in the table.m
classdef file, so anything else you do should not make it difficult to merge back any changes in table.m
.
Any particular occurrences of that which are bothering you, or you're wondering about?
I only spotted the array2table
, cell2table
, struct2table
, and pp
functions. Although the latter is not technically shadowing the prettyprint
method
the array2table, cell2table, struct2table, and pp functions
Those are all just functions. There are no method overrides for them in Tablicious's classes, IIRC.
And I should probably remove pp
. That's a command I like to use for interactive convenience, but it's not an established function elsewhere, and Tablicious probably shouldn't be grabbing a short two-letter global name like that.
Those are all just functions. There are no method overrides for them in Tablicious's classes, IIRC.
You are right, it's `pp' that is a method.
BTW, does table
allow for nested tables?
In the string
class, we need to change the constructor to identify missing cases properly. I will make a separate PR for this. I also spotted a bug in string.m
at line 307. It will be included in the PR
No, its [([renaming master to main])] ok. I will make a new clone as needed [...]
Renamed master to main just now. Please make a fresh clone and fork (and/or update the branch names and tracking configuration in your current clone) at your convenience.
BTW, does table allow for nested tables?
Not currently. It ought to, but I haven't been able to get that working correctly with various indexing operations (()
, {}
, and .
) around nested tables, particularly chained indexing. It's related to subsasgn/subsref somehow, but I don't know if the problem is in my own code in the implementation of the override methods for those, or some issue with core Octave's support for them. Not sure exactyly where I left this off last time I was working on it.
Here's a ticket for nested table support: https://github.com/apjanke/octave-tablicious/issues/22
need to change the constructor to identify missing cases properly [...]
Oops, yeah. Numeric NaNs and NaT datetimes should convert to missing string values. (And really, that isnan/isnat check should be done first, and the datestr
or num2str
call not even be done on NaN/NaT input elements.) Not sure how I missed that.
Note that some of the code in the string
constructor may be redundant (but consistent) with that of string()
methods defined on classes like datetime
or categorical
. That's intentional: I designed the classes in this package to be somewhat "severable", such that you could pick up the implementation of one class and move it to another OF package or to core Octave and it would work with an alternate implementation of its complementary classes there, or with one to be supplied in the future. It's an attempt to make Tablicious friendly to rearranging things w/r/t other packages, like that "table support in core Octave" being discussed over on Discord. Table arrays, strings, and datetime aren't really a natural combination for a single package, aside from that they're all things I use together with tabular data analysis and all absent from core Octave.
a bug in string.m at line 307.
D'oh! Yup. (Adequate tests like you recommend could have caught that.) Think a good approach there is to just have the string.cell
method call string.cellstr
, in line with how string.cell
's helptext talks it returning the same as string.cellstr
and just being there for interface compatibility.
Okay, the generated doco is building again, on the bulbasaur-24
branch. Have a look there and in https://github.com/apjanke/octave-tablicious/issues/110.
I rebased and squashed the bulbasaur-24 branch. You'll need to rebase your changes on that fresh version of it if you're coding on top of that branch.
I'll merge it to main
pretty soon, since publishing to Tablicious's GH Pages currently can only be done from main, and then update the GH Pages site.
I am struggling to make a PR for string.m
class due to continuous changes somehow messing up the bulbasaur-24 branch on my side. Can I just make a PR directly to main for the sting.m
file?
Update, I just made a PR on the bulbasaur-24 branch through the GitHub web interface
Thanks! Looks like this PR is based on the right state of the bulbasaur-24 branch and can merge cleanly. I'll have a look through it.
One issue - this PR mixes a few different and IMHO kind of independent things, which I'd like to make independent choices on whether to accept, and also to have them be distinct commits in the Git commit history. (For readability when skimming the commit history or when examining a diff for one commit.) If you'd like to get author credit for them, could you split them up in to separate PRs? Items I see:
string
ctor, and adding a test for itthis.
in string.cell
's this.tfMissing
reference (fine to put in the same PR as the ctor changes){Function}
to {Tablicious}
I'll leave the bulbasaur-24 branch stable with no further rebases tonight to make it easy for you to do additional PRs here if you want. (I'm in US Eastern Time.)
In the mean time, I'll drop a couple comments on this existing PR.
If you'd like to get author credit for them,
That's ok, merge them as you see fit. I don't have a git history for this, because my local repo was broken due to the repeated re-basing between branches.
Cool. I'll just grab these code changes piecemeal and merge some of them in, then.
Thanks!
Okay, I think I've got all the stuff that conflicts with Statistics at a basic level removed. It's merged to main. See anything left that will cause you major problems here? Think I'm getting close to an 0.4.0 release that will pick these changes up.
Add BISTs and DEMOs [...]
Say, can you or somebody point me at some examples or doco on how to add DEMOs to modern Octave code? This is a good idea but I don't really know how to do it.
Add BISTs and DEMOs [...]
Say, can you or somebody point me at some examples or doco on how to add DEMOs to modern Octave code? This is a good idea but I don't really know how to do it.
See the ClassificationKNN
classdef in the statistics package, which contains BISTs for both the constructor and the method, as well as 2 demos.
What I do in statistics, is that I first add the demo code, then add BISTs related to output testing and at the end I add BISTs for input validation.
e.g.
...
endfunction
%!demo
%! ## Plot a sinewave
%! x = [1:0.1:10];
%! plot (x, sin (x));
%!demo
%! ## Another demo
%! ...
## Test output
%!test
%! x = 1;
%! y = 2;
%! assert (sum (x, y), 3);
%!assert(sum(1, 2), 3)
## Test input validation
%!error<somefunction: some error> somefunction (bad input)
Note that assert
inside a test block musthave a space before the %!
. Without a space (like in %!assert
) is considered a new separate BIST. If you want to reuse input data for more tests, you can use
%!shared x, y
%! x = 1;
%! y = 2;
%!test
%! assert (sum (x, y), 3);
%!test
%! assert (sum (x, x), 2);
%!assert (sum (y, y), 4);
Okay, I think I've got the Statistics compatibility stuff about done, fixed up some other issues, and am about ready to cut a Tablicious 0.4.0 release. I merged most of my pending code to main
, since I had several changes piling up that interacted with each other.
Is there anything left here that presents a major Statistics interoperability issue that we should get done before Tablicious 0.4.0?
I'm going to put BISTS and demos down the road a bit to 0.5.0; that looks like a larger effort. Handing it under this other ticket:
Can you wait a couple of days? I have been working full time the pas 5 days on the table class and I am getting close to submit a PR on it, which includes a lot of bug fixes, and much enhanced compatibility. Ideally, I would need an extra 3-4 days to add all the relevant BISTs as well and cleanup the code base and coding style of earlier code that I haven't tested yet.
Closing as done, since Statistics decided to go with their own table implementation: https://github.com/apjanke/octave-tablicious/issues/115#issuecomment-1925874715.
Let's make Tablicious play better with the Statistics Octave Forge (OF) package.
Statistics's maintainer has expressed interest in adding support for table arrays and other stuff in Tablicious. Let's support that.
Clean up function shadowing and conflicts with the Statistics OF package, particularly WRT the missing/fillmissing stuff. To make Tablicious compatible with Statistics so Statistics can use Tablicious to support table inputs.
High priority, since Statistics would like to do this for a coming-soon release.
Work
Work for this is on the
bulbasaur-24
branch in this repo.TODO
categorical
's use of unconditionalismissing()
calls.ismissing
calls in functions which don't require Statistics.*missing
(fillmissing, rmmissing, standardizeMissing, but not ismissing) in Tablicious-defined classes.istable()
function, with minimal and complete definition that supports copy-paste shadowing.tblish.examples
. theexamples/
folder contains scripts; dunno if they should go in a package)octave.*
package while at it; that name should be left for core Octave to use, I think.__oct_time_binsearch__.oct
to "tblish*" something.References