Closed qifeng-bai closed 1 month ago
4.5.0 Snapshot has been deployed on lists-test. Database has been reset
We can click "Rematch" button to do full rematching. After the rematching is on process, we will see:
By click, we can see the progress:
@charvolant
We have noticed that the process is slowing down when it processes large amount species:
The list dr18679, which has 166,350 records. The time completed for the first 500 records starts from 2 seconds, increased to 14seconds (~7 seconds in reading those 500 records) after 147,500 of 166,350 has been processed.
Cpu hits > 85%, Memory 6.2G/7.6.
CPU reduced to 2% 15 secs after the process is completed.
We believed that the offset relating Hibernate pagination is the possible reason. The fellowing query were used in the rematching process
SpeciesListItem.findAllByDataResourceUid(id, [max: BATCH_SIZE, offset: offset])
` def c = SpeciesListItem.createCriteria()
items = c.list(max: BATCH_SIZE, offset: offset) {
order "id", "desc"
le("id", beforeId)
}`
I built a test case to check if the assumption is correct. I use two different query to query from the first available species id, then do 50,000 offset, 50,500, 100,000 offset. The reading speed should obviously reduce if our assumption is correct.
And also, I recorded the id of the first available specie id, No. 50,000, 100,000 id, and used the where id > offset
to fetch 500 records, to check if there is any difference.
The results shew the reading speeds have no obvious difference. The each reading process has been finished in 1 second.
Here is the result:
2024-02-08 16:00:59.354 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : The list {dr18679} has totla 166350
2024-02-08 16:00:59.354 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching 500 records with offset 0
2024-02-08 16:00:59.923 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching completed
2024-02-08 16:01:02.928 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching 500 records with offset 50000
2024-02-08 16:01:03.053 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching completed
2024-02-08 16:01:03.053 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching 500 records with offset 50500
2024-02-08 16:01:03.134 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching completed
2024-02-08 16:01:06.139 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching 500 records with from 2*50000
2024-02-08 16:01:06.341 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching completed
2024-02-08 16:01:09.341 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Testing clause where id > ??
2024-02-08 16:01:09.342 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching 500 records from the first item
2024-02-08 16:01:09.851 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching completed
2024-02-08 16:01:12.857 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching 500 records from NO.50,000
2024-02-08 16:01:12.881 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching completed
2024-02-08 16:01:15.887 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching 500 records from No. 100,000
2024-02-08 16:01:15.911 INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService : Fetching completed
Few attempts: DB:SpeciesListItem * :1 MatchedSpecies. When a rematched species is found , first find it from MatchedSpecies. Link it to the speciesListItem if found. Or created a new MatchedSpecies and then link.
DO NOT WORK. We batch processing 500 species (at this moment) in a transaction. So when we find a new matched species and save to DB, but this record is actually not pushed into DB until the transaction is completed
Performance records: Process the largest list of dr18679 (6 month ago) , has 166350 species
For each 500 species rematching process, the time elapsed gradually increased from ~2.6s to ~8.5s, The process CPU load increased from ~12% to ~26.5% Reading speed for each 500 species increased from 0.5s to 1s
However, when system started rematching 151219 species in the list dr18685 after the previous list is completed reading speed was reduced to 0.023s. Reason is yet known [Uploading cpu_log.txt…]()
We believed that querying a large dataset with offset slows down the query. Using ScorllableResults would be faster.
However, when we use this snippet of reading code. It ended with ~1s slower than 'offset'
def session = sessionFactory.currentSession
def hql = "FROM SpeciesListItem WHERE dataResourceUid = :listDRId AND id > :startId ORDER BY id "
def query = session.createQuery(hql)
query.setParameter("listDRId", listDRId)
query.setParameter("startId", startId)
def scrollableResults = query.scroll(ScrollMode.FORWARD_ONLY)
while (scrollableResults.next()) {
def species = (SpeciesListItem) scrollableResults.get(0)
items.add(species)
if (items.size() > BATCH_SIZE ) {
startId = species.id
break;
}
}
scrollableResults.close()
log.info("reading time: ${TimeCategory.minus(new Date(), `startReading)}")`
Rematched 500 of 166350 completed, time elapsed: 3.854 seconds, System CPU load: 0.0%, Process CPU load : 12.063%
reading time: 1.591 seconds
Rematched 1000 of 166350 completed, time elapsed: 1.823 seconds, System CPU load: 60.8824%, Process CPU load : 12.8777%
reading time: 1.721 seconds
Rematched 1500 of 166350 completed, time elapsed: 1.609 seconds, System CPU load: 53.8346%, Process CPU load : 12.4646%
reading time: 1.566 seconds
**
For improve the reading speed of scrollableResult, we need to keep Session is open and scrollableResult remains open after , for example, 500 records were loaded and being processed. The benefit is the reading speed maintain stable ~0.8s, compare with Offset solution. the time won't increase with the more records have been read
In this way, we need to create a new Session, with long timeout to ~4Hours for the largest list. Otherwise it may lead to ScrollableResults "expiring" if the cursor is left open too long without being used.
**
2024-06-07 12:02:03.907 INFO --- au.org.ala.specieslist.HelperService : Reading 161000 / 166350 took 0.681 seconds
2024-06-07 12:02:04.673 INFO --- au.org.ala.specieslist.HelperService : Reading 161500 / 166350 took 0.766 seconds
2024-06-07 12:02:05.516 INFO --- au.org.ala.specieslist.HelperService : Reading 162000 / 166350 took 0.843 seconds
2024-06-07 12:02:06.352 INFO --- au.org.ala.specieslist.HelperService : Reading 162500 / 166350 took 0.836 seconds
2024-06-07 12:02:07.115 INFO --- au.org.ala.specieslist.HelperService : Reading 163000 / 166350 took 0.763 seconds
2024-06-07 12:02:07.740 INFO --- au.org.ala.specieslist.HelperService : Reading 163500 / 166350 took 0.625 seconds
2024-06-07 12:02:08.674 INFO --- au.org.ala.specieslist.HelperService : Reading 164000 / 166350 took 0.934 seconds
2024-06-07 12:02:09.709 INFO --- au.org.ala.specieslist.HelperService : Reading 164500 / 166350 took 1.034 seconds
2024-06-07 12:02:10.726 INFO --- au.org.ala.specieslist.HelperService : Reading 165000 / 166350 took 1.017 seconds
2024-06-07 12:02:11.492 INFO --- au.org.ala.specieslist.HelperService : Reading 165500 / 166350 took 0.766 seconds
2024-06-07 12:02:12.213 INFO --- au.org.ala.specieslist.HelperService : Reading 166000 / 166350 took 0.720 seconds
If we use
query.setParameter("listDRId", listDRId)
query.setFirstResult(offset) // Set the starting point of the results
query.setMaxResults(BATCH_SIZE) // Limit the number of results to fetch
It does not improve the reading process. The reading time starts from ~0.8s, increase slightly to 1.2s, (Without CPU burden caused by Rematching
2024-06-07 10:53:26.775 INFO --- au.org.ala.specieslist.HelperService : Reading 500 from 1500 / 166350 took 0.785 secondss
2024-06-07 10:53:27.431 INFO --- au.org.ala.specieslist.HelperService : Reading 500 from 2000 / 166350 took 0.656 secondss
2024-06-07 10:53:28.146 INFO --- au.org.ala.specieslist.HelperService : Reading 500 from 2500 / 166350 took 0.715 secondss
2024-06-07 10:53:29.066 INFO --- au.org.ala.specieslist.HelperService : Reading 500 from 3000 / 166350 took 0.920 secondss
2024-06-07 10:53:29.822 INFO --- au.org.ala.specieslist.HelperService : Reading 500 from 3500 / 166350 took 0.756 secondss
2024-06-07 10:58:26.471 INFO --- au.org.ala.specieslist.HelperService : Reading 500 from 164500 / 166350 took 1.221 secondss
2024-06-07 10:58:27.701 INFO --- au.org.ala.specieslist.HelperService : Reading 500 from 165000 / 166350 took 1.230 secondss
2024-06-07 10:58:28.830 INFO --- au.org.ala.specieslist.HelperService : Reading 500 from 165500 / 166350 took 1.129 secondss
Using a new session with a long timeout connection, the reading / writing speeds on processing a large list is very stable.
Starts with reading/writing : ~0.025S/0.838s from the beginning, and R/W 0.024S/0.98S
Total time cost to complete the list of dr18679 [166350] : 18 minutes, 26.660 seconds
2024-06-13 13:12:00.258 INFO --- au.org.ala.specieslist.HelperService : Reading 500 / 166350 took 0.025 seconds
2024-06-13 13:12:02.014 INFO --- au.org.ala.specieslist.HelperService : Rematching took 1.748 seconds
2024-06-13 13:12:02.873 INFO --- au.org.ala.specieslist.HelperService : Saving to DB took 0.848 seconds
2024-06-13 13:12:02.889 INFO --- au.org.ala.specieslist.HelperService : Reading 1000 / 166350 took 0.014 seconds
2024-06-13 13:12:04.997 INFO --- au.org.ala.specieslist.HelperService : Rematching took 2.108 seconds
2024-06-13 13:12:05.855 INFO --- au.org.ala.specieslist.HelperService : Saving to DB took 0.858 seconds
2024-06-13 13:12:05.872 INFO --- au.org.ala.specieslist.HelperService : Reading 1500 / 166350 took 0.017 seconds
2024-06-13 13:12:08.169 INFO --- au.org.ala.specieslist.HelperService : Rematching took 2.297 seconds
2024-06-13 13:12:09.037 INFO --- au.org.ala.specieslist.HelperService : Saving to DB took 0.867 seconds
2024-06-13 13:12:09.054 INFO --- au.org.ala.specieslist.HelperService : Reading 2000 / 166350 took 0.017 seconds
2024-06-13 13:12:10.977 INFO --- au.org.ala.specieslist.HelperService : Rematching took 1.922 seconds
2024-06-13 13:12:11.824 INFO --- au.org.ala.specieslist.HelperService : Saving to DB took 0.847 seconds
2024-06-13 13:29:37.804 INFO --- au.org.ala.specieslist.HelperService : Reading 165000 / 166350 took 0.027 seconds
2024-06-13 13:29:40.376 INFO --- au.org.ala.specieslist.HelperService : Rematching took 2.572 seconds
2024-06-13 13:29:41.300 INFO --- au.org.ala.specieslist.HelperService : Saving to DB took 0.923 seconds
2024-06-13 13:29:41.326 INFO --- au.org.ala.specieslist.HelperService : Reading 165500 / 166350 took 0.026 seconds
2024-06-13 13:29:43.088 INFO --- au.org.ala.specieslist.HelperService : Rematching took 1.762 seconds
2024-06-13 13:29:44.009 INFO --- au.org.ala.specieslist.HelperService : Saving to DB took 0.921 seconds
2024-06-13 13:29:44.033 INFO --- au.org.ala.specieslist.HelperService : Reading 166000 / 166350 took 0.024 seconds
2024-06-13 13:29:46.641 INFO --- au.org.ala.specieslist.HelperService : Rematching took 2.608 seconds
2024-06-13 13:29:47.609 INFO --- au.org.ala.specieslist.HelperService : Saving to DB took 0.968 seconds
2024-06-13 13:29:49.607 INFO --- au.org.ala.specieslist.HelperService : Rematching took 1.977 seconds
2024-06-13 13:29:50.343 INFO --- au.org.ala.specieslist.HelperService : Saving to DB took 0.736 seconds
Full rematch testing:
~166K ~14 minutes -> 200/s
~38K ~3m ->220/s
~1017, 6s -> 169/s
~248, 2s -> 124/s
~20, 2s -> 10/s
2024-06-14 09:40:49.416 INFO --- au.org.ala.specieslist.HelperService : Rematching 166350 species in the list dr18679
2024-06-14 09:55:22.167 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr18679 [166350] : 14 minutes, 32.746 seconds
2024-06-14 09:55:25.278 INFO --- au.org.ala.specieslist.HelperService : Rematching 151219 species in the list dr18685
2024-06-14 10:08:33.134 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr18685 [151219] : 13 minutes, 7.856 seconds
2024-06-14 10:08:35.488 INFO --- au.org.ala.specieslist.HelperService : Rematching 38412 species in the list dr13406
2024-06-14 10:11:33.622 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr13406 [38412] : 2 minutes, 58.134 seconds
2024-06-14 10:11:35.864 INFO --- au.org.ala.specieslist.HelperService : Rematching 35277 species in the list dr8309
2024-06-14 10:14:15.864 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr8309 [35277] : 2 minutes, 40.000 seconds
2024-06-14 10:51:23.047 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr8046 [7307] : 39.216 seconds
2024-06-14 10:51:25.761 INFO --- au.org.ala.specieslist.HelperService : Rematching 7147 species in the list dr2720
2024-06-14 10:51:55.762 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr2720 [7147] : 30.001 seconds
2024-06-14 10:51:58.325 INFO --- au.org.ala.specieslist.HelperService : Rematching 7063 species in the list dr7781
2024-06-14 10:52:26.489 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr7781 [7063] : 28.164 seconds
2024-06-14 10:52:29.533 INFO --- au.org.ala.specieslist.HelperService : Rematching 7031 species in the list dr22372
2024-06-14 12:03:07.785 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr8126 [1017] : 6.634 seconds
2024-06-14 12:03:11.480 INFO --- au.org.ala.specieslist.HelperService : Rematching 1017 species in the list dr8127
2024-06-14 12:03:18.303 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr8127 [1017] : 6.823 seconds
2024-06-14 12:03:21.941 INFO --- au.org.ala.specieslist.HelperService : Rematching 1017 species in the list dr9045
2024-06-14 12:03:27.811 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr9045 [1017] : 5.870 seconds
2024-06-14 12:25:36.580 INFO --- au.org.ala.specieslist.HelperService : Rematching 250 species in the list dr1946
2024-06-14 12:25:41.740 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr1946 [250] : 5.160 seconds
2024-06-14 12:25:45.659 INFO --- au.org.ala.specieslist.HelperService : Rematching 250 species in the list dr1949
2024-06-14 12:25:51.606 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr1949 [250] : 5.947 seconds
2024-06-14 12:25:55.373 INFO --- au.org.ala.specieslist.HelperService : Rematching 248 species in the list dr7389
2024-06-14 12:25:58.606 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr7389 [248] : 3.233 seconds
2024-06-14 12:26:02.377 INFO --- au.org.ala.specieslist.HelperService : Rematching 242 species in the list dr6796
2024-06-14 13:49:56.551 INFO --- au.org.ala.specieslist.HelperService : Rematching 38 species in the list dr11501
2024-06-14 13:50:00.274 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr11501 [19] : 3.723 seconds
2024-06-14 13:50:04.347 INFO --- au.org.ala.specieslist.HelperService : Rematching 19 species in the list dr11500
2024-06-14 13:50:06.873 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr11500 [19] : 2.526 seconds
2024-06-14 13:50:10.948 INFO --- au.org.ala.specieslist.HelperService : Rematching 38 species in the list dr11501
2024-06-14 13:50:14.165 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr11501 [19] : 3.217 seconds
2024-06-14 13:50:18.137 INFO --- au.org.ala.specieslist.HelperService : Rematching 19 species in the list dr11502
2024-06-14 13:50:20.523 INFO --- au.org.ala.specieslist.HelperService : Total time cost to complete the list of dr11502 [19] : 2.386 seconds
2024-06-14 13:50:24.471 INFO --- au.org.ala.specieslist.HelperService : Rematching 19 species in the list dr11503
Rematching on lists-test was completed :
Total time to complete 3631/3631 lists : 1 hours, 10 minutes, 54.559 seconds
There is still a small issue in rematchLog: https://lists-test.ala.org.au/ws/rematchLog/1. [Deleted] The last two logs were not written into the database
2024-06-21 17:11:11.248 INFO --- [0.1-8080-exec-9] au.org.ala.specieslist.HelperService : dr22734 [ 1 ] completed, time cost : 0.303 seconds
2024-06-21 17:11:11.250 INFO --- [0.1-8080-exec-9] au.org.ala.specieslist.HelperService : Rematching 1 species in the list dr22735
2024-06-21 17:11:11.551 INFO --- [0.1-8080-exec-9] au.org.ala.specieslist.HelperService : dr22735 [ 1 ] completed, time cost : 0.301 seconds
2024-06-21 17:11:11.568 INFO --- [0.1-8080-exec-9] a.o.a.specieslist.SpeciesListController : Total time to complete 3631/3631 lists : 1 hours, 10 minutes, 54.559 seconds
Probably why the status of the rematching does not change as well, remains at RUNNING
Another round rematching was tested. It was completed in 2 hours. https://lists-test.ala.org.au/ws/rematchLog/2
@hamzajaved-csiro and I figured out a workaround to solve the issue the last rematching log was not written in DB. However, we did not understand the exact reasons why the last log was not saved .
Once a namematching service is reindexed, the lists need to have a way to rematch all existing lists. add a function to trace the progress. We can continue aborted mission if necessary
Memo: Rematching after > 100,000 species, starts significantly affect CPU load