Open alexander-bloss opened 4 months ago
Hey @alexander-bloss!
I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve this issue.
To improve or optimize the behavior of the query SELECT * FROM refs ORDER BY created_date desc
in the refdiff_task_data.go
file, you should add an index on the created_date
column. This will significantly speed up the sorting operation, especially given that the refs
table has over 1GB of data.
Here is how you can create an index on the created_date
column:
CREATE INDEX idx_created_date ON refs(created_date);
Adding this index will help the database engine to quickly locate and sort the rows based on the created_date
column, thereby improving the performance of the query [1].
Additionally, ensure that the created_date
column is populated with data. If it is currently empty, you might need to update your data insertion logic to include values for this column [2].
To continue talking to Dosu, mention @dosu.
@dosu @warren830, we've already followed those instructions but is still not working.
I think that Devlake should populate this column, right? It doesn't makes sense to do that other way.
To resolve the issue of the SELECT * FROM refs ORDER BY created_date desc
query taking too long, you should ensure that the created_date
column is populated. This column is intended to store the creation date of the references (such as tags or branches) during the normal operation of DevLake [1].
If the column is not being populated automatically, you might need to check if there are any specific plugins or modules responsible for this. For example, the ExtractApiRepositories
function in the backend/plugins/gitee/tasks/repo_extractor.go
file is responsible for extracting and populating the created_date
column in the refs
table [2].
Here is a general approach to populate the created_date
column if it is currently empty:
Update the created_date
column with a default value:
UPDATE refs
SET created_date = NOW()
WHERE created_date IS NULL;
Ensure that the data extraction logic includes the created_date
:
Verify that the data extraction processes or plugins are correctly setting the created_date
field when new data is inserted into the refs
table.
After ensuring the created_date
column is populated, re-run your query:
SELECT * FROM refs ORDER BY created_date DESC;
These steps should help improve the performance of your query.
To continue talking to Dosu, mention @dosu.
BTW, I'm using Azure DevOps Go Connection/Plugin.
@dosu, i got it! But I need Devlake to do this.
To ensure that DevLake populates the created_date
column in the refs
table when using the Azure DevOps Go Connection/Plugin, you need to verify that the relevant functions handle this field correctly.
In the Azure DevOps Go Connection/Plugin, the ConvertApiTimelineRecords
function is responsible for inserting records into the refs
table and includes logic for setting the created_date
field. Here is the relevant code snippet:
createdAt := time.Now()
if tlRecord.StartTime != nil {
createdAt = *tlRecord.StartTime
}
domainTask := &devops.CICDTask{
DomainEntity: domainlayer.DomainEntity{
Id: tlRecordIdGen.Generate(data.Options.ConnectionId, tlRecord.RecordId, tlRecord.BuildId),
},
Name: tlRecord.Name,
PipelineId: buildIdGen.Generate(data.Options.ConnectionId, tlRecord.BuildId),
Result: devops.GetResult(cicdTaskResultRule, tlRecord.Result),
Status: devops.GetStatus(cicdTaskStatusRule, tlRecord.State),
OriginalStatus: tlRecord.State,
OriginalResult: tlRecord.Result,
DurationSec: duration,
Environment: data.RegexEnricher.ReturnNameIfMatched(devops.PRODUCTION, tlRecord.Name),
Type: data.RegexEnricher.ReturnNameIfMatched(devops.DEPLOYMENT, tlRecord.Name),
TaskDatesInfo: devops.TaskDatesInfo{
CreatedDate: createdAt,
StartedDate: tlRecord.StartTime,
FinishedDate: tlRecord.FinishTime,
},
CicdScopeId: repoIdGen.Generate(data.Options.ConnectionId, data.Options.RepositoryId),
}
This function sets the created_date
field to the current time or the StartTime
of the timeline record if it is available [1].
Ensure that this function is correctly invoked during the data extraction process. If the created_date
field is still not being populated, you may need to review the data flow and ensure that the ConvertApiTimelineRecords
function is properly integrated into your pipeline.
To continue talking to Dosu, mention @dosu.
I think we should filter refs
by id
prefixed with repo_id
which could utilize the PrimaryKey index.
@alexander-bloss How about this issue?
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
Maybe I am having a problem with this change. Check it out:
The problem is:
refs
ORDER BY created_date desc' is taking too long (15+ minutes) to complete and a simple (5 repositories) Azure DevOps extraction takes an hour to complete.**Is there anything I can do to improve/optimize this behaviour?
Originally posted by @alexander-bloss in https://github.com/apache/incubator-devlake/issues/1474#issuecomment-2207222757