Macaulay2 / M2

The primary source code repository for Macaulay2, a system for computing in commutative algebra, algebraic geometry and related fields.
https://macaulay2.com
343 stars 230 forks source link

Feature request: allow only reading a package header #1085

Closed mahrud closed 4 years ago

mahrud commented 4 years ago

This section of Macaulay2Doc, which is responsible for making a list of packages, calls needsPackage on every package, but this isn't necessary since all we need is the information in the package header. https://github.com/Macaulay2/M2/blob/b205e6fc41833e209601704952563cee15cd887d/M2/Macaulay2/packages/Macaulay2Doc/overview_packages.m2#L7-L32

Beside the obvious issues that this is unnecessary, it slows down installation of packages in parallel. For instance, before Macaulay2Doc is installed, if you install any other package (even FirstPackage), somehow* M2 tries to generate that particular documentation node of Macaulay2Doc, and unintentionally runs path checks from SemidefiniteProgramming, which produce a lot of errors if the software aren't installed: https://github.com/Macaulay2/M2/blob/b205e6fc41833e209601704952563cee15cd887d/M2/Macaulay2/packages/SemidefiniteProgramming.m2#L91-L96

I have been thinking about refactoring packages.m2 (and html.m2) for this and several other reasons, but wanted to make this issue for now in case anyone has other ideas.

*: not sure why or where this happens yet.

mahrud commented 4 years ago

Example of the errors produced from SemidefiniteProgramming:

--Installing package PackageCitations
--making example results for PackageCitations
--making example results for cite
which: no  in (...)
which: no mosek in (...)
which: no sdpa in (...)
Solvers configured: CSDP
Default solver: CSDP
-- polymake not present
DanGrayson commented 4 years ago

There's a related issue: https://github.com/Macaulay2/M2/issues/776

How would you get the header without loading the package?

mahrud commented 4 years ago

I presume we can stop reading the file when "newPackage(...)" closes. Like a break in the evaluation loop.

DanGrayson commented 4 years ago

Good idea. Set a static flag, and then at the end of "newPackage", conditionally return the symbol "end", which will end the parsing of the file. Oops, there is a problem: the end of file causes "endPackage" to be run, and the package will be entered into the list of packages as a complete package. So you have to override that, too, and make sure the next "needsPackage" rereads the file instead of just opening the package's empty dictionary.

mahrud commented 4 years ago

Something like that, yes.

Before getting to that, I wanted to see what's really slowing down installPackage before Macaulay2Doc is installed and found where installPackage calls help takes 18 seconds every time: https://github.com/Macaulay2/M2/blob/d6c70324fb5dc511bfec32cee9a8880a1e48e5e9/M2/Macaulay2/m2/html.m2#L849 Which is in turn slow because help has this line: https://github.com/Macaulay2/M2/blob/d6c70324fb5dc511bfec32cee9a8880a1e48e5e9/M2/Macaulay2/m2/document.m2#L1177 Which then calls this: https://github.com/Macaulay2/M2/blob/d6c70324fb5dc511bfec32cee9a8880a1e48e5e9/M2/Macaulay2/m2/document.m2#L20

And here is my question: the top line of that function says:

-- this function should be made obsolete, because we should install the Macaulay2Doc package first

Why is that? If I'm just installing a single package, why should Macaulay2Doc be installed first? If installing or loading it sets a certain variable or generates something specific, why not make that an individual thing instead?

DanGrayson commented 4 years ago

The Macaulay2Doc package generates the Macaulay2 documentation for the core of Macaulay2. There's no way to generate just one documentation node.

The reason this package should be installed first is so that the numerous links from the documentation in a package to documentation of core items can be correctly installed. We need to know that those things are present, so we know which links in the current package to flag as bad ones going to non-existent documentation nodes. There is no way to generate just one documentation node at a time, because we have to flag the bad links.

The same goes if the package links to any documentation nodes in any other packages -- those packages should be installed first. On the other hand, we don't have a way to enforce that the resulting graph of dependencies is linear, unless we would start insisting that links to documentation nodes are made only to packages that are imported or exported by the newPackage command (plus Macaulay2Doc).

mahrud commented 4 years ago

I'm sure there are easier ways to handle links, e.g. Wikipedia doesn't need to traverse its entire collections to know which links don't exist. Why not assume at the time that the link is correct and check later? Then the graph doesn't need to be a tree.

DanGrayson commented 4 years ago

We do that, too. We have a program I wrote called 'html-check-links'.

Why do you want to be able to install a package before Macaulay2Doc has been installed?

mahrud commented 4 years ago

We do that, too. We have a program I wrote called 'html-check-links'.

So we should be able to stop checking links as we install. I also noticed that info somehow calls needsPackage "Macaulay2Doc" via a different route (i.e. not through checkLoadDocumentation()). Do you know where that happens? This is the culprit call to info: https://github.com/Macaulay2/M2/blob/d6c70324fb5dc511bfec32cee9a8880a1e48e5e9/M2/Macaulay2/m2/html.m2#L941

Why do you want to be able to install a package before Macaulay2Doc has been installed?

Macaulay2Doc is simply too bulky as a package and takes too long to generate, so I want to install other packages in parallel. Eventually I might also want to generate my own documentation format, but this dependency on Macaulay2Doc slows the process quite a lot.

DanGrayson commented 4 years ago

If we stop checking links as we install, I predict you'll get thousands of missing links when you check at the end, which will all have to be fixed.

mahrud commented 4 years ago

Referencing #187 on installing packages in parallel.

mahrud commented 4 years ago

Referencing #508 on reading package header.

mahrud commented 4 years ago

Referencing #625 on generating documentation for individual packages.

mahrud commented 4 years ago

Quoting from @DanGrayson in #508 to consolidate the issues:

In the file M2/Macaulay2/packages/Macaulay2Doc/overview_packages.m2 we load every package just to get the headlines. A new option to needsPackage, loadPackage, and newPackage called HeaderOnly would allow for a solution of lighter weight -- the package file would stop loading after the newPackage command, so the header information would be available (where?), but the rest of the package would not be loaded.

mahrud commented 4 years ago

This is now possible with the readPackage function.

DanGrayson commented 4 years ago

This is now possible with the readPackage function.

I hope you mention this in the change log in changes.m2.

mahrud commented 4 years ago

The node is documented, but I'll make a PR to add this.

DanGrayson commented 4 years ago

All new features, in addition to being documented, have to be mentioned in the change log, so people can find out about them.