SciNim / Unchained

A fully type safe, compile time only units library.
https://scinim.github.io/Unchained
109 stars 0 forks source link

Combining unchained and cligen.dispatch results in extremely long compilation #4

Closed Vindaar closed 1 year ago

Vindaar commented 3 years ago

The following trivial code:

import unchained, cligen

proc main() = discard
when isMainModule: dispatch main

results in a compilation that takes 55 s and uses 10 GB of RAM.

Some sort of weird macro / concept business going on!

Showstopper, because cligen needs to work. :)

@c-blake: this is a fun, huh? :laughing:

c-blake commented 3 years ago

Lol. I just reproduced. I would try each 1 of the 3 imports import unchain does to see if it is just one of them and go from there. No idea why this would happen and I have not seen it before.

c-blake commented 3 years ago

Whatever it is needs only import unchained/units and also fails going back to nim-0.20.2 (though that is about 2x faster than nim-devel). It also has the same problem with cligen-7ffb96b4944f61ea622bf991e03a966b0986b872 with either nim-0.20.2 or nim-devel. So, this is very much not a transient problem, if that is at all helpful.

Running perf on the compiler itself, I get interiorAllocatedPtr__.. taking up the most by a lot, then rawExecute__.. and doOperation__... So, yeah..some kind of NimVM+GC problem which does implicate macro/template/concept/etc., but there is still another 50% of run-time in various GC related things.

It kind of smells like a leak/loop that might otherwise be infinite but hits some VM bound/limit before stopping - but still succeeding...Your units.nim and cligen are nowhere near the 10GB that gets eaten. Might be able to confirm that by e.g., fiddling with VM limit knobs an watching peak memory usage.

Not sure I can help much more unless you find something simple cligen can do to sidestep the issue.

Vindaar commented 3 years ago

Thanks for taking a look! That wasn't necessary. :)

Whatever it is needs only import unchained/units

Yes, this makes sense as essentially everything so far is in units.nim. I'm pretty sure the problem is either related to the . macro here:

https://github.com/SciNim/Unchained/blob/master/src/unchained/units.nim#L1446

or the SomeUnit concept:

https://github.com/SciNim/Unchained/blob/master/src/unchained/units.nim#L305-L325

and also fails going back to nim-0.20.2 (though that is about 2x faster than nim-devel). It also has the same problem with cligen-7ffb96b4944f61ea622bf991e03a966b0986b872 with either nim-0.20.2 or nim-devel. So, this is very much not a transient problem, if that is at all helpful.

Thanks for checking that! For understanding the cause it won't help too much, but it's good to know anyway.

Running perf on the compiler itself, I get interiorAllocatedPtr.. taking up the most by a lot, then rawExecute.. and doOperation__... So, yeah..some kind of NimVM+GC problem which does implicate macro/template/concept/etc., but there is still another 50% of run-time in various GC related things.

I did the same yesterday as well (I suppose I should have mentioned that in the issue, but didn't have enough time to write a lengthy issue yesterday evening). Unfortunately, I didn't really know what to make of this information, given that yeah, it's a VM issue.

It kind of smells like a leak/loop that might otherwise be infinite but hits some VM bound/limit before stopping - but still succeeding...Your units.nim and cligen are nowhere near the 10GB that gets eaten. Might be able to confirm that by e.g., fiddling with VM limit knobs an watching peak memory usage.

Interesting idea, I might give that a try!

Not sure I can help much more unless you find something simple cligen can do to sidestep the issue.

Don't worry about it! I didn't link you with the intention that you solve my problem. ;) I'll tinker a bit and maybe I can even extract a minimal example to raise a Nim issue about it.

Vindaar commented 3 years ago

Yes, it seems to be related to resolving the concept. Just replacing the concept by a dummy results in a normal compilation. Leaves the issue as to how to write a stricter concept that doesn't cause this issue...

Vindaar commented 3 years ago

This code is enough to reproduce the issue:

import cligen
import macros

proc resolveAlias(n: NimNode): NimNode =
  ## returns the first type that is `distinct` (i.e. convert Newton -> KiloGram•Meter•Second⁻²)
  case n.kind
  of nnkDistinctTy: result = n
  of nnkBracketExpr: result = (n[1].getImpl).resolveAlias
  of nnkSym:
    if n.getTypeInst != n:
      result = n.getTypeInst.resolveAlias
    else:
      result = n.getImpl.resolveAlias
  of nnkTypeDef:
    if n[2].kind == nnkDistinctTy: result = n[0]
    else: result = n[2].getImpl.resolveAlias
  else: result = newEmptyNode()

macro isAUnit*(x: typed): untyped =
  #if true: return newLit false # <- using this compiles fine, i.e. resolveAlias is the culprit
  let x = x.resolveAlias()
  result = newLit true

type
  SomeUnit* = concept x
    isAUnit(x)

proc `$`*[T: SomeUnit](s: T): string = &"{s.float:.4g} {$type(s)}"

proc main() = discard
when isMainModule: dispatch main

Turns out my gut feeling of this hacky alias resolution business was right after all, haha.

edit: in particular it seems to be related to using SomeUnit for a $ proc. Defining any other proc in its place does not reproduce the problem.

edit2: the problematic branch is of course:

  of nnkSym:
    if n.getTypeInst != n:
      result = n.getTypeInst.resolveAlias

Now how to properly resolve these...

c-blake commented 3 years ago

Glad you isolated the problem! Maybe it's some kind of weird scaling of "number of aliases"^3..4 or something that punches things up to 10 GB. I don't see that in your resolveAlias, but I didn't look that closely.

I'm sure as a physicist you are aware of the general approach of natural units, not just c=hbar=1, but gravitational c=G=1 units or c=hbar=G=1 Planck units, etc. Essentially the problem is that the dimensionality of one's unit system changes. SI will be like 6 or 7 dimensional (proliferating dimensions to avoid fractional exponents), but natural or gravitational just 1 dimensional.

Avoiding fractions is kind of loosing game since. Just as soon as you find it necessary to refer to the factors of anything "new", e.g. Coulombian F=kq_1q_2/r^2), those factors (or the constant) will be an occasion for fractional units on things (like the qs..hence the 1/2 powers in CGS electrostatic units). Most would agree that having your system of units restrict what equational relationships can be expressed or at least type checked is a weak setup..Hence losing game. This also connects quite strongly to natural/gravitational/Planck/etc. units.

Anyway, 99+% of compute science people seem unaware of the arbitrariness of dimensionality, just delegating "essential novelty" to SI/physicists some of whom say "Bah...you only need 1 dim! Keep my formulae tidy!". This unawareness is all the more amusing because unit checking is quite literally the aboriginal "type system", at the dawn of numerical calculations (which have been by hand for most of its lifetime). Why, metaphorical explanations like apples & oranges even pervade both "intro to units" and "intro to types". Even so, if you ask CS people when humanity started finding types useful, I honestly believe <1% would get this question right. But >99% would immediately acknowledge it as truth once mentioned!

The first unit was probably time (for crop planting/marking the years). Not entirely sure we have a historical record of it, TBH. Babylonians "won" leading to our still weird variable radix with base-60 in the middle time expression.

Back to the main dimensionality point - units themselves (except for weird quasi-units like Celsius or Fahrenheit best restricted to only be about differences) are a vector space. So projecting down or back projecting up to change whole systems of units are ideas that can be systematized. I think Nim could do the compile-time linear algebra over rationals to do these projection/back projections. I actually think that less likely to blow up the compiler. :-) Mostly it is just compile-time seq vs compile-time array at some foundational level, though I did see you using enums for BaseUnitKind.

"System conversions" are also trickier to do by hand than just "unit conversions" and so maybe even more motivated. So, maybe something for your TODOs/comments in your code. { Also, while exp(kx) or q^pi type transcendental relationships are possible, I think sticking to the rational field is probably best, at least in a software context where finite precision is intrinsic. }

c-blake commented 3 years ago

Also, it sounds like the cligen interaction is just some (explicit or implicit) use of $ which might even be just from cligen/argcvt. So, I bet you could find an even more minimal reproducer before filing a Nim bug report. You could use my -d:printDispatch compile flag saving the output to a file, reproduce the problem without the dispatch macro (but with a lot of cligen imports), then delete parts of the parser with wild abandon until it goes away. Once it does go away, you have probably isolated the interaction and can maybe construct a smaller repro for the Nim core guys. As mentioned, this behavior has been around a long time.

c-blake commented 3 years ago

Alternatively/additionally, you may be able to try to just read the parser and/or argcvt for related entities and build up a non-cligen dependent example. cligen is just kind of a big system. I guess the Nim core guys could try all of this as well. Anyway, it's my bedtime over here. So, best of luck!

Vindaar commented 3 years ago

The problematic code is fixed in https://github.com/SciNim/Unchained/commit/7c4d77199e5c55b1c319686c51f96b0e55a4cac8, but the underlying problem still exists in the Nim compiler. Leaving open for that reason.

c-blake commented 3 years ago

Btw, once you have a decent set of physical constants like c, h, pi, G, etc., an alternative to your development with ukNaturalEnergy/MEv and the like would be equational instead of checking/conversional. I.e. rather than converting directly from a physical quantity like velocity or momentum or whatever into SI units, you compute the expression in powers of the fundamental constants that had been/could be set to unity (1 & dimensionless, no pun intended). In some ways an SI unitted value might be less useful than the constant-ed formula/factor. Even starting at run-time with back & forth system conversion and evolving eventually to compile-time would be awesome.

And I really, really don't mean to come off as overcritical blathering on about all that. It's just not something I've ever seen any other units package do..basically ever and I saw a lot of formative idea notes in random_thoughts_and_todos.org and untested preliminary code, etc. I wrote some similar long spiel to someone in the late 90s to some guy doing a Haskell package in the exact same way, but he never really did anything along those lines. You know way more than he did out of the gate. So, I'm more hopeful and really just trying to encourage you in a more general direction than I was seeing because I see the power within you! :-) No condescension or offense intended!

Vindaar commented 3 years ago

Thanks for your insightful messages. And no worries, I didn't take any of it badly!

I'm sure as a physicist you are aware of the general approach of natural units, not just c=hbar=1, but gravitational c=G=1 units or c=hbar=G=1 Planck units, etc. Essentially the problem is that the dimensionality of one's unit system changes. SI will be like 6 or 7 dimensional (proliferating dimensions to avoid fractional exponents), but natural or gravitational just 1 dimensional. ... Back to the main dimensionality point - units themselves (except for weird quasi-units like Celsius or Fahrenheit best restricted to only be about differences) are a vector space. So projecting down or back projecting up to change whole systems of units are ideas that can be systematized. I think Nim could do the compile-time linear algebra over rationals to do these projection/back projections

Yes, this is a great point. I'm not sure if a system of units is an actual vector space (haven't thought enough about it from that angle to verify it fulfils all axioms, haha), but it is at least extremly helpful to think about the basic quantities in a system of units as essentially a set basis vectors spanning a vector space. There's some blabbering of me along those lines maybe in a comment in the code I think.

SI will be like 6 or 7 dimensional (proliferating dimensions to avoid fractional exponents), but natural or gravitational just 1 dimensional.

I'm not entirely sure though if it's helpful to think about e.g. natural units as such. Yes, purely mathematically one can see such units as having reduced dimensionality. But the moment one talks about anything that is physics and not pure mathematics, the notion of quantities comes right back (hence the reason dimensional analysis in theoretical physics usually considers quantities and not units). In a sense then, a number with a unit attached in natural units is a lossy statement. Unless information about the intended quantities is available the physical interpretation of said number is not really possible /outside that system of units/. The problem is of course that our understanding is very limited if we restrict ourselves to talking about things in natural units without mentioning what is a mass, an energy etc. E.g.

let p = 5.MeV
let m = 511.keV

without further information there is no way to know that p may or may not refer to a momentum (and thus "misses" a c and m may or may not refer to a mass and thus "miss" a c^2).

But more below...

Btw, once you have a decent set of physical constants like c, h, pi, G, etc., an alternative to your development with ukNaturalEnergy/MEv and the like would be equational instead of checking/conversional. I.e. rather than converting directly from a physical quantity like velocity or momentum or whatever into SI units, you compute the expression in powers of the fundamental constants that had been/could be set to unity (1 & dimensionless, no pun intended). In some ways an SI unitted value might be less useful than the constant-ed formula/factor.

This is, hazily I admit, along the lines of what I'm thinking about in terms of non SI units. To me it isn't entirely clear yet what a good solution is that maps to more or less natural Nim code (i.e. not having to wrap things in a full blown DSL for instance or having to use more complex runtime objects).

The reason natural units are not really implemented is precisely because I don't find the idea of having to define 5.NaturalLength etc all that nice. It makes it easy to work with, because all quantity information is available, but a bit annoying to write and very verbose.

To me at this moment it is not entirely clear what such an "equational" approach should mean from a user perspective and what information would best be kept with each atom, nor what such atoms should look like. My brain will keep thinking about these things for me for sure though. :smile:

Even starting at run-time with back & forth system conversion and evolving eventually to compile-time would be awesome.

In principle I agree! It's just that using distincts is a pretty natural fit for Nim to achieve very nice results relatively easily (from an implementation standpoint) without having to worry about runtime sacrifices. The thing is that the moment the implementation is complex and runtime based, it's application suddenly is restricted to code where performance does not matter too much. I'd rather have a "good enough" solution that is elegant (in terms of the end result) and that I can freely insert into my code for pure benefits.

And well, for my purposes SI units are by far the most important anyway. :)

c-blake commented 3 years ago

Just a quick reply - they are a vector space, really in the most ordinary way - all 6 base unit exponents are just a vector, scalar multiplication is exponentiation of the whole unit { like (meters/seconds)^2 }, etc. Vector addition being adding exponents, all 0s being identity, closure being any set of exponents is ok, etc. I don't think the constraint to rationals for the field/elements changes anything fundamentally, except the arithmetic in doing linear algebra.

The exponent vector/vector space is a sort of "meta unit" which is the real "type" of the physical quantity while conversion ratios that go in each slot are a different thing related to specific realizations of that type like "kph" vs "foot per second". I find it helps in thinking about how to systematize things like null space/range/solutions of the "matrix" you get from a set of constants like c, hbar. As is so common in much physics, the way this stuff is taught/learned is less formal, though. Coding it up forces one to be more careful, but then having a fully realized set up like SI units allows one to be less general, perhaps oversimplified. { I think that is the not so secret agenda of SI. :-) }

Part of why I think SI is too restrictive is that it just ignores some stuff like bits & bytes or money or etc. Information is kind of pseudo-physical (more like the Candela based on human eyes) -- but can fruitfully combine like erg/bit in networking or computation. Not being extensible feels sort of like C or Nim with only the ability to have structs of long, int, short, char and no typedefs, to make a kind of lame analogy. So, even meta units should perhaps be extensible to SI dimensions + whatever.

You are absolutely right that you need extra information to deal with projecting/back projecting across systems. I tried to allude to that. Not only would you need a mini database of meta-units like energy, momentum, velocity...you would also need whichever constants (maybe in SI or the maximal meta-set terms) get set to 1 & dimensionless define the projection. It may be possible to project down with only the constants, though. For things like "regular physics" there are probably only about 20 of these "quantity meta-units aka types". So, hard coding them as an enum might be workable. There are probably only 6..9 common "systems"..cgs, mks, SI, relativistic only (just c=1), natural, gravitational, Planck, and maybe 3 or so more domain specific ones. Anyway, enough variety to motivate a general facility, IMO, but even if you had to hardcode them all it would still be better than the competition.

So, while I get that "just use SI" is the comfort zone that virtually all unit packages stay in, it feels unsatisfying. I'm not necessarily trying to push you into too hard work or some setup with a lot of run-time cost, but more to motivate why the simple solution feels inadequate and sketch some approaches to a more general one. I agree the Nim mapping might be tricky-to-impossible with meta-unit and user-defined system generality. An awful lot can be ported to compile-time from run time. So, if it were me, I would first focus on all the 2-level (meta & realized) and just the mechanics of how they all work together. Then, I'd try to make it compile-time. I realize this may feel backwards given "how close" Nim distinct already seems (and I know they used units as an exmple in the docs).

For what it's worth, I saw some article several years back comparing software approaches to units that examined like 20 different packages in many programming languages. None of them did user defined systems/projections/conversions even though some were in very dynamic languages like Perl or their own thing. So, I really am not being critical but hopeful. I should try to find that article. Anyway, more later.

c-blake commented 3 years ago

I'm not sure meta type vs realized type/concrete type are the best terms. I've never heard anyone really spell out a fully capable terminology for this stuff. That the "type" for the purposes of "type checking the validity of additive arithmetic" is "distance" not "meters" is perhaps most readily understood since a programming language could auto-convert/correct an expression like "5.meters - 3.feet" (as with Nim's [0] convention perhaps taking the output unit from the 1st in an additive/subtractive chain). That the type has a 2nd dimension is understood from "5.meters - 3.feet" needing "help" to become realized into a number. So, the type of a fully realized physical quantity is sort of two-level/two-stage (not quite 2-D, but not quite not 2-D { in a different sense of dimension than the number of base units }).

c-blake commented 3 years ago

This isn't terrible, but does not handle the generalized some-constants-equal-1 unit system construction like natural units: https://www.cs.utexas.edu/users/novak/units95.html { and they say SI includes money. Who knew? :-) }

c-blake commented 3 years ago

And here is the survey I was thinking of: https://gmpreussner.com/research/dimensional-analysis-in-programming-languages

c-blake commented 3 years ago

Also closely related to the "systems with c=1" scenario is dimensional analysis, as you kind of alluded to, but this has more linear algebraic elaboration: https://en.wikipedia.org/wiki/Buckingham_%CF%80_theorem

c-blake commented 3 years ago

Anyway, sorry to pepper you with all this. The TL;DR is:

Starting with a fully abstract foundation for arbitrary dimensional analysis (not just SI) for any number of base units, then Si units & SI constants, then projection down/back projection up and also dimensional analysis utilities feels like the right sort of design layering/factoring/feature set.

Feasibility-wise, I think it could yield just compile-time everything like you want including expanded <, +- combo-checking-converting idea which could be nice in generic proc scenario. Maybe not depending on things in Nim like compile-time seqs being just arrays that cannot be extended. Even so, it would probably still be cleaner to have that kind of organization with 7 predefined systems differing only in a few parameters like a base unit vector/constants type things. Then the dim-analysis stuff like the pi theorem or system conversions (possibly spelled "add a factor of c^2") might have to be run-time.

c-blake commented 1 year ago

EDIT: @arkanoid87 wrote (but then deleted, I guess):

I'm not a fan of cligen and I use cliche for exactly this kind of issues [..]The cligen code feels hacked in place and generates unpredictable issues elsewhere in the code, I won't blame unchained for this as this happened to me with quite simple projects, too.

AFAICT, there is no outstanding cligen issue here and @Vindaar solved the issue without any cligen change. Who is to say your unelaborated "quite simple" projects are not at fault (or nim itself)? Changing to another tool may just tweak a subtle Nim VM interaction effect.

Simplifying problems simplifies their solutions (often by an exponential multiplier, but such always "depends on a lot"). cligen has many more features than cliche which does not even rise to std/parseopt (I am thinking short options). Maybe you do not want any of those features which is fine. Many features/much complexity of cligen derive from requests of other users as mentioned at the end of the README. An agenda to just delete all beyond the simplest possible subset is thus hostile to existing users in a way that seems unwise, but I am always open to specific suggestions that respect other users.

If you have something more concrete than "feelings" or other very under-articulated complaints, a cligen issue is a better way to express them as has already been said. Here is not the place.

Vindaar commented 1 year ago

As mentioned by @c-blake the original issue has long been fixed. I guess I just forgot to close the issue after it was.

arkanoid87 commented 1 year ago

I deleted my comment for a reason.