Closed spbolton closed 3 months ago
Issue has occured again https://github.com/dotCMS/core/actions/runs/10102403821/job/27938346606
com.dotmarketing.business.DeterministicIdentifierAPITest ► Test_Language_Deterministic_Id[2: LanguageTestCase{expectedSeed=‘Language:sg:SAG’, expectedHash=4713118, langCode=‘sg’, countryCode=‘SAG’, country=‘’}] Failed test found in: /tmp/build-reports-test/build-reports-test-IT Tests MainSuite 2b/dotcms-integration/target/failsafe-reports/TEST-com.dotmarketing.business.DeterministicIdentifierAPITest.xml Error: java.lang.AssertionError: expected:<4713118> but was:<1721947949243>
The reoccurrence of this issue seems to be due to there being more than one DataGen for language that is using the same breaking logic. So the fix we added although reducing the probability of creating a duplicate language code, does not prevent it totally as these other 2 datagen classes can still create duplicates. We should combine if possible the datagen functionality if possible to ensure we are not repeating this duplicate prevention logic in each of these classes. https://github.com/dotCMS/core/blob/13e219eeeed1c2c88ac69493509289efef45fbdf/dotcms-integration/src/test/java/com/dotcms/datagen/LanguageCodeDataGen.java and https://github.com/dotCMS/core/blob/13e219eeeed1c2c88ac69493509289efef45fbdf/dotcms-integration/src/test/java/com/dotcms/datagen/LanguageDataGen.java#L11
The working logic is currently in com.dotmarketing.portlets.languagesmanager.business.LanguageDataGen We should either merge the functionality of the multiple dataGens or have a utility method they can all use for the common logic.
just realized a possible better option for the language generator though. Rather than randomly generating the 4 characters for a languange and then having to check it does not match an existing code we use in the tests or is in the db, we could instead always make sure the generated codes contain a letter we otherwise do not use e.g. start with “x” “xa-fg” “xg-ss” this would help to identify generated codes also. We still need to check in the db if the code already exists as in com.dotmarketing.portlets.languagesmanager.business.LanguageDataGen but it removes the need to store a list of codes we use already in the tests. I am not sure whether randomness is required in this case rather than just uniqueness. if when generating the code we have to query anyway we could get all codes starting with x and just add a new one, can then just increment e.d. xa-aa, xa-ab,xa-ac but not sure if there is any particular benefit in either solution other than it would indicate the order they were created in when output in the logs.
LanguageDataGen creates random languages for tests, it is random only to 4 different a-z chars e.g. en-us.
There is some logic to prevent certain known codes to be tried but DeterministicIdentifierAPITest does not use those codes and it seems to be checking the country code instead of language code. The the language already exists the DeterministicIdentifierAPITest falls back to a unique system time based value causing the test to faile.
Need to fix the check to prevent known "language" codes from being created and include the values that are used in DeterministicIdentifierAPITest.
Seen in Merge Queue https://github.com/dotCMS/core/actions/runs/9203251769
(https://github.com/dotCMS/core/files/15418640/10_Merge.Group.Test._.JVM.IT.Tests.-.JDK.11.MainSuite.2b.txt)