eclipse-ocl / org.eclipse.ocl

Eclipse Public License 2.0
1 stars 0 forks source link

[evaluator] Inadequate support for large integers #443

Closed eclipse-ocl-bot closed 2 months ago

eclipse-ocl-bot commented 2 months ago

| --- | --- | | Bugzilla Link | 290605 | | Status | CLOSED FIXED | | Importance | P3 normal | | Reported | Sep 25, 2009 17:11 EDT | | Modified | May 29, 2012 13:21 EDT | | Depends on | 318048 | | Blocks | 156363 | | Reporter | Oliver Wong |

Description

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/532.0 (KHTML, like Gecko) Chrome/3.0.195.21 Safari/532.0\ Build Identifier: I20090611-1540

In the OCL specification at http://www.omg.org/docs/ptc/06-05-01.pdf section 11.4.2, it says that the OCL type "Integer" refers to the mathematical concept of an integer, with no upper bound. However, when we try to pass in a query with large integers, we get a parse exception. For example, the query "1196888373 < 9999999999" generates the following stack trace:

Caused by: java.lang.NumberFormatException: For input string: "9999999999"\ at java.lang.NumberFormatException.forInputString(Unknown Source)\ at java.lang.Integer.parseInt(Unknown Source)\ at java.lang.Integer.valueOf(Unknown Source)\ at org.eclipse.ocl.parser.AbstractOCLParser.createIntegerLiteralExpCS(AbstractOCLParser.java:516)\ at org.eclipse.ocl.parser.OCLParser.ruleAction(OCLParser.java:1154)\ at lpg.lpgjavaruntime.DeterministicParser.processReductions(DeterministicParser.java:55)\ at lpg.lpgjavaruntime.DeterministicParser.parse(DeterministicParser.java:115)\ at org.eclipse.ocl.parser.OCLParser.parseTokensToCST(OCLParser.java:106)\ at org.eclipse.ocl.lpg.AbstractParser.parseTokensToCST(AbstractParser.java:220)\ at org.eclipse.ocl.parser.OCLAnalyzer.parseConcreteSyntax(OCLAnalyzer.java:140)\ at org.eclipse.ocl.parser.OCLAnalyzer.parseInvOrDefCS(OCLAnalyzer.java:263)\ at org.eclipse.ocl.internal.helper.HelperUtil.parseQuery(HelperUtil.java:164)\ at org.eclipse.ocl.internal.helper.OCLHelperImpl.createQuery(OCLHelperImpl.java:175) ... 3 more

It appears that the current implementation is mapping OCL's Integer type onto Java's int type. In our EMF models, we're using fields representing timestamps in the form of milliseconds since epochs, and so we frequently use numbers which are too big to be store in Java ints, and which need to be instead be stored in longs.

It seems like the "most correct" solution would be to map OCL's Integer onto Java's BigInteger, but I guess that might be a performance problem. Perhaps we can change the implementation to use long "for now" as a quick patch solution, until a solution is found to support truly arbitrarily large integers?

Reproducible: Always

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Sep 25, 2009 17:44

There is support for BigDecimal and BigInteger values that originate in models, but as you observe this is missing in the concrete syntax.

I think the correct solution is to check the string length and parse for a BigInteger when too many digits or too large a value for Integer.

Internally the generic java.lang.Number inheritance should sort out the alternatives.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Dec 18, 2009 13:12

Bug 298205 has been marked as a duplicate of this bug.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 12, 2010 06:21

I'm trying to support BigInteger and BigDecimal in the revised evaluator in the ReflectiveLibrary branch.

The observable problem is that the parser method createIntegerLiteralExpCS uses Integer.valueOf to parse the lexed string. This is easily fixed but then

a) IntegerLiteralExpCS.integerSymbol is an Integer\ b) IntegerLiteralExp.integerSymbol is an Integer

(ditto for UnlimitedNatural, Real)

Therefore we

either introduce BigIntegerLiteralExp etc and surprise clients with new classes\ or change IntegerLiteralExp.integerSymbol to be a Number (Short,Integer,Long,BigInteger or Float,Double,BigDecimal in practice).

Introducing new classes will give tightly-coupled clients run-time failures since the requisite handlers will be missing.

Changing the type to Number will give tightly-coupled clients compile-time failures.

I prefer the change to Number since variant IntegerLiteralExp or RealLiteralExp classes are not supported by the OCL specification, and since this exposes rather than hides the API change.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 12, 2010 06:26

Bug 246895 has been marked as a duplicate of this bug.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 12, 2010 14:36

Created attachment 155901 (attachment deleted)\ AST/CST change for Big numbers

The attached patch changing from Integer/Double to Number for {Integer,Real,UnlimitedNatural}LiteralExp{,CS} surprisingly does not break any OCL or QVTd test. Extra JUnit tests validate the parsing, serialisation and deserialisation of Big numbers.

The patch causes no compilation problems for QVTo. I am unable to tell whether QVTo run-time is affected; 790/815 tests appear to fail in my workspace.

NB. Previous behaviour was to NFE on big numbers.

The new behaviour parses, serialises and deserialises big numbers.

Maths on Big numbers is poorly supported. The rewrite of operations in the ReflectiveLibrary branch will fix this.

Temporarily some high precision Double numbers will be parsed as BigDecimal causing failure when used in mathematical operations.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 13, 2010 05:10

(Patch seems to have a few missing @since's.)

If we're changing the type of integerSymbol and realSymbol anyway, we could introduce new wrapper types org.eclipse.ocl.types.Integer and org.eclipse.ocl.types.Real that encapsulate the choice of representation type. This would localize some of the Double or BigDecimal logic. [I'm undecided whether this is a good idea, since it creates an extra object for every numeric literal and is not backwards compatible with the serialised AST, so I raise this possibility inviting positive/negative comments.]

We probably need some xxxOption to allow users to control the threshold at which arithmetic promotes from Double to BigDecimal (Integer promotion is easy; grow on overflow). The patch chooses the precision of a litertal based on whether the string representation has 15 or more letters so -1.0 is Double and -1.000000000e00 is BigDecimal. When to promote Double / Double to BigDecimal is very subjective. BigDecimal has a precision, so choosing an appropriate precision is also subjective.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 13, 2010 06:02

The test_supportForELongAttributes_198451 provides some insight into the existing behaviour.

Integer, Long, BigInteger, Double and BigDecimal

are preferred representations.

Byte, Short, Float

are tolerated representations that are converted to Integer, Double.

The test

assertEquals(1, evaluate(helper, thing, "numeros->at(1).asLong() - numeros->at(2).asLong()"));

passes because the Long - Long maths returns Long but the result which is 1 is passed through NumberUtil.coerceNumber which tries to find a simpler representation and converts long 1 to integer 1 which is equal to the asserted integer 1.

This behaviour is a little expensive but well-defined. The patch extends this support for arbitrary intermediate results to arbitrary persisted results. We just want to refine the new createRealNumber to use BigDecimal for anything with any fractional digits or more than 15 integer digits and to coerceNumber the result back to Double where possible. No change to createIntegerNumber which is already 'optimal'. [0.1 cannot be precisely represented by Double, whereas it is trivial with BigDecimal.]

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 13, 2010 06:34

Further:

        assertEquals((int) maxIntMinusOne, evaluate(helper, thing,\
            String.format("(%d + %d).div(2) - 1", maxInt, maxInt)));

passes when adding an Integer overflow because all numeric arguments are promoted to a higher precision before any numeric operation.

Ow! we are incurring a promotion cost on every numeric argument, and a demotion cost on every numeric result.

Ow! we are losing precision on decimals; bug 299477.

Ow! we are not detecting long overflows; Java does not report arithmetic issues.

Given the costs we are incurring through conversions, we might as well just use BigInteger and BigDecimal throughout. The AST once again has narrow types.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 14, 2010 09:45

Created attachment 156100 Use of BigDecimal and BigInteger internally

[Attached patch applies to latest CVS, in which generated code has been regenerated to avoid reviewing 100 files of gratuitous differences on this patch.]

Changing the internal and persisted representations of Real and Integer to BigDecimal and BigInteger has a bit more impact than changing to Number, but leads to more predictable simpler behaviour and solves most of the problems other than extensibility and legibility. API Compatibility is enhanced for scalar values by invoking NumberUtil.coerceNumber only on the final result rather than every intermediate calculation.

[Since UnlimitedNaturalLiteralExp.intergerSymbol is changing type, we might as well track Issue 14196 and change its spelling at the saem time.]

Impact on tests: non-scalar result checks must use BigDecimal or BigInteger. Impact on QVTd: two one-line changes are required to resolve compilation errors. Impact on QVTo: no changes needed for compilation problems.

Visible change:

RealLiteralExp(CS).realSymbol is BigDecimal rather than Double. IntegerLiteralExp(CS).integerSymbol is BigInteger rather than Integer. UnlimitedNaturalLiteralExp(CS).integerSymbol is BigInteger rather than Integer and name unlimitedNaturalSymbol.

setRealSymbol, setIntegerSymbol provide some compatibibility for original signatures.

Scalar return from OCL.evaluate unchanged because coerceNumber invoked on result. Collection/Tuple query results contain BigDecimal or BigInteger rather than Double, Integer.

New: OCL.rawEvaluate supports return of BigDecimal or BigInteger bypassing coerceNumber.

Hidden changes:

oclAsType now works and is tested for Integer/Real conversions. BigInteger and BigDecimal internal results should not throw exceptions -* now works and is tested

:notepad_spiral: Bug290605d.patch

eclipse-ocl-bot commented 2 months ago

By Adolfo Sanchez-Barbudo Herrera on Jan 15, 2010 11:51

Ed,

When trying apply the patch with in the ReflectiveLibrary branch I have a lot of problems. Fewer problems (just some merge problems in in a tests plugin) while applying against HEAD, so I guess that "latest CVS" = HEAD.

Looking into it.

Cheers,\ Adolfo.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 15, 2010 12:33

Yes. The patch is for HEAD.

I have just committed the change to the ReflectiveLibrary Branch which has now caught up with LPG2 etc. The Ecore tests run with about 55 failures; to be investigated. The UML tests do not yet run.

Switching to BigDecimal/BigInteger makes the revised approach much easier too. Take a peek at org.eclipse.ocl.evaluator.operations to see how modular it now is.

It would be good to have a preliminary approval that we want to change to BigInteger/BigDecimal always. Then perhaps the nitty gritty review can be a little slower.

The patch probably fixes 20 to 50 obscure problems, but may well creates nearly as many similarly obscure problems. It is only with the ReflectiveLibrary that we can really clear these problems. I expect to clear at least 15 bugzillas in one go.

eclipse-ocl-bot commented 2 months ago

By Adolfo Sanchez-Barbudo Herrera on Jan 15, 2010 13:50

Ed,

To be honest, I'm learning rather than doing a nice reviewing with this bugzilla, so my contribution on this is being simply reduced to catch some noise. Let's see if I can help ;).

About the approach. In principle, (and you know my position about MDT-OCL 3.0.0), I'm not worried about any inconvenience related to breaking API, or any backwards compatibility. Actually, we must take advantage to improve the implementation, and it seems that you are fixing a lot of numeric conversions related nuisances. So,

  1. I'm happy with the solution (my +1 to the approach)
  2. I wouldn't include "setRealSymbol, setIntegerSymbol methods to provide some compatibibility for original signatures", since
    • it partially provides compatibility.
    • I don't like the not generated code.\ So if the API need to be broken for the expression symbol, let's learn clients to use the new one instead of including partial compatibility solutions.

Apart from that, have a look to the following comments

Have a nice weekend,\ Adolfo.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jan 21, 2010 15:48

Thanks for the review/observations:

The plugin references seem toi be a limitation of the UML codegen. I can't see any difference between EDouble and EBigDecimal in the UML or Ecore profile so ... ?

This code has evolved slightly to provide better compatibility in the ReflectiveBranch.

I'm no longer inclined to commit this for 3.0.0 if the ReflectiveLibrary doesn't also go in 3.0.0.

eclipse-ocl-bot commented 2 months ago

By Alexander Igdalov on Feb 01, 2010 07:40

(In reply to comment #13)\

I'm no longer inclined to commit this for 3.0.0 if the ReflectiveLibrary doesn't also go in 3.0.0.

I agree. Let's include it into 4.0.0.

-----------\ Unlike Java (and other multipurpose languages) there is no variety of different-precision types for numbers in OCL. I have no clear opinion on whether we should we submit a request to OMG to add types representing Java's int, long and double to OCL. Perhaps, it is ok for OCL to be slower but less bloated?

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Feb 01, 2010 12:19

The OCL evaluation exhibits significant control-code bloat so that the arithmetic overhead is not that bad. I judged, without measurement, that the costs of regular precision conversions in the current code were comparable and perhaps worse than the cost of always using BigXXX.

When a code Synthesis tool is used to produce efficient Java/C++/VHDL/... it will benefit from a numeric analysis to determine the numeric bounds of intermediate results and so allow synthesis of short or whatever.

OCL should perhaps allow OCL definition rather than augmentation of classes so that I can specify

def: MyShort conformsTo Integer\ inv: -32768 <= self and self <= 32767

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Feb 01, 2011 02:35

The new evaluator use BigInteger and BigDecimal throughout behind IntegerValue and RealValue.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on May 27, 2011 06:40

Resolved for Indigo is 3.1.0 not 3.2.0.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on May 01, 2012 16:29

Bug 366368 imptoves the legacy support so that Long selectively working is improved to probably fully working. In particular numeric and tuple literals may be Long. Arithmetic corners for Integer/Long are tested.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on May 29, 2012 13:21

Closing all bugs resolved in Indigo.