Onto-Med / top-frontend

JavaScript based frontend of the TOP Framework
MIT License
0 stars 1 forks source link

Some fixes before pulling changes into main #346

Closed fmatthies closed 2 months ago

fmatthies commented 4 months ago

pic

pic2

ChristophB commented 4 months ago

@ChristophB: don't know how excessive a change this would be, but right now in EntityForm the expression-input for a CompositeConcept dictates that Constant are excluded. However, the Distance function requires as its second parameter a constant (or rather an integer).

We could remove this exclusion from the expression-input component. There is currently no way to check if an argument is applicable to a function, so constants will be available everywhere in the concept expression editor.

ChristophB commented 4 months ago

We could remove this exclusion from the expression-input component. There is currently no way to check if an argument is applicable to a function, so constants will be available everywhere in the concept expression editor.

Some more adjustments are needed here. Constants are not limited to integers.

Edit: Done in 1583fda4

ChristophB commented 4 months ago

@fmatthies / @ChristophB: need to look into status text, as it doesn't seem to update within the Dialog. If it's closed and opened again, all is well.

This is still hard for me to reproduce without a proper document server and graph API. The status is derived from the computed value graphPipelineStatus. My guess is that pipeline responses might be missing the status value.

ChristophB commented 4 months ago

@fmatthies / @ChristophB: I added a Build query button to Documents page. (see picture 1) I thought that it would be nice to jump into the Queries page with the Data Source selected from where one jumped. However, before I make it functional, I wanted to see/check if this is something that goes along with the design choice because a Repository needs to be selected as well

Currently, the button just redirects to the query builder. I think this is fine for now. And there are also two things to consider here:

  1. The document search page is adapter-based and completely unrelated to repositories. So we have now way for the user to select a repository before clicking the button.
  2. Repositories might now be configured for all data adapters. It is unclear if an adapter-repository combination is allowed to perform queries with.
ChristophB commented 4 months ago

@fmatthies / @ChristophB: when coming from Query results (Show data set) the left menu is not updated (see picture 2) and clicking on Documents resets the url but still shows the query results (one needs to hit F5 or change to another TOP page and back)

I think, the cause is that there is no data source selected if the page is accessed via "show data set". The desired data source can be propagated via the property dataSource of the DocumentSearchForm component or retrieved from the query object.

fmatthies commented 4 months ago

Okay, thanks for the heads up!

fmatthies commented 4 months ago

@fmatthies / @ChristophB: I added a Build query button to Documents page. (see picture 1) I thought that it would be nice to jump into the Queries page with the Data Source selected from where one jumped. However, before I make it functional, I wanted to see/check if this is something that goes along with the design choice because a Repository needs to be selected as well

Currently, the button just redirects to the query builder. I think this is fine for now. And there are also two things to consider here:

1. The document search page is adapter-based and completely unrelated to repositories. So we have now way for the user to select a repository before clicking the button.

2. Repositories might now be configured for all data adapters. It is unclear if an adapter-repository combination is allowed to perform queries with.

When clicking the button, the dataSource will be sent to QueryBuilder as well. Therein setRepository checks whether the data source is configured for this particular repo-organisation. If not, a notify warning is shown to inform the user. Otherwise, the dataSource is pre-selected. (a441570)

fmatthies commented 4 months ago

@fmatthies / @ChristophB: when coming from Query results (Show data set) the left menu is not updated (see picture 2) and clicking on Documents resets the url but still shows the query results (one needs to hit F5 or change to another TOP page and back)

I think, the cause is that there is no data source selected if the page is accessed via "show data set". The desired data source can be propagated via the property dataSource of the DocumentSearchForm component or retrieved from the query object.

Hm, I just checked. I already implemented that. The second picture shows the dataSource is successfully read out, as well.

fmatthies commented 4 months ago

@fmatthies / @ChristophB: need to look into status text, as it doesn't seem to update within the Dialog. If it's closed and opened again, all is well.

This is still hard for me to reproduce without a proper document server and graph API. The status is derived from the computed value graphPipelineStatus. My guess is that pipeline responses might be missing the status value.

Not just necessarily for this issue, but in general for the backend (or better yet the adapters):

fmatthies commented 4 months ago

The two datasource configs I use are attached (added a txt ext because github did not allow for yml to be uploaded) Test_Data_Source_1.yml.txt Test_Data_Source_2.yml.txt

ChristophB commented 3 months ago

Task 4 might be fixed now.

ChristophB commented 2 months ago

Still can't reproduce pipelines locally. I don't have documents.

fmatthies commented 2 months ago

Elasticsearch runs under 0.0.0.0:9008 and the index is documents if you need test documents.

ChristophB commented 2 months ago

This doesn't really help. When I try to run it with a local instance, the following error is raised:

{"error":"Couldn't find graph pickle 'test_data_source_3_graph.pickle'. Probably steps before failed; check the logs.","name":"test_data_source_3"} -- 500 Internal Server Error from GET http://localhost:9007/graph/statistics?process=Test_Data_Source_3

But initially response to startConceptGraphPipeline(...) has SUCCESSFUL as status. There seems to be something off with either backend or concept-graphs. Another request for the pipeline status is necessary to get the correct FAILED status.

fmatthies commented 2 months ago

Okay, thanks. I look into it. Could you provide me with your setup? Could be that I didn't update the concept-graphs-api on top-prod. I always used my local instance thereof to be able to debug it.

ChristophB commented 2 months ago

I use a local instance as well, with branch "issues_4_5_improvements".

ChristophB commented 2 months ago

clicking on Documents resets the url but still shows the query results (one needs to hit F5 or change to another TOP page and back)

This is kind of intentional, because clicking on "Documents" has the same effect as clearing the query result. In both cases, the data source remains selected. Should we leave it this way?

fmatthies commented 2 months ago

This doesn't really help. When I try to run it with a local instance, the following error is raised:

{"error":"Couldn't find graph pickle 'test_data_source_3_graph.pickle'. Probably steps before failed; check the logs.","name":"test_data_source_3"} -- 500 Internal Server Error from GET http://localhost:9007/graph/statistics?process=Test_Data_Source_3

But initially response to startConceptGraphPipeline(...) has SUCCESSFUL as status. There seems to be something off with either backend or concept-graphs. Another request for the pipeline status is necessary to get the correct FAILED status.

Can you see the logs of the concept-graphs-api? Normally, this error indicates that some step in the graph creation process failed (one of process documents, embed phrases, clustering, or graph creation)?

ChristophB commented 2 months ago

This is the output of concept graphs:

concept-graphs-api-1 | INFO:main:Using process name 'test_data_source_3' concept-graphs-api-1 | [2024-07-17 11:42:24,109] INFO in main_methods: Using process name 'test_data_source_3' concept-graphs-api-1 | INFO:main:Using preset language settings for 'de' concept-graphs-api-1 | [2024-07-17 11:42:24,110] INFO in main_methods: Using preset language settings for 'de' concept-graphs-api-1 | INFO:main:Skipping present saved steps concept-graphs-api-1 | [2024-07-17 11:42:24,110] INFO in main_methods: Skipping present saved steps concept-graphs-api-1 | INFO:main:Reading config (data) ... concept-graphs-api-1 | [2024-07-17 11:42:24,136] INFO in main_methods: Reading config (data) ... concept-graphs-api-1 | INFO:main:No config file provided; using default values concept-graphs-api-1 | [2024-07-17 11:42:24,136] INFO in preprocessing_util: No config file provided; using default values concept-graphs-api-1 | INFO:main:Parsed the following arguments for <preprocessing_util.PreprocessingUtil object at 0x7f2378f83850>: concept-graphs-api-1 | {'spacy_model': 'de_dep_news_trf', 'file_encoding': 'utf-8', 'corpus_name': 'test_data_source_3'} concept-graphs-api-1 | [2024-07-17 11:42:24,136] INFO in main_methods: Parsed the following arguments for <preprocessing_util.PreprocessingUtil object at 0x7f2378f83850>: concept-graphs-api-1 | {'spacy_model': 'de_dep_news_trf', 'file_encoding': 'utf-8', 'corpus_name': 'test_data_source_3'} concept-graphs-api-1 | INFO:main:Labels will be extracted from the document server if the field 'label' is present. concept-graphs-api-1 | [2024-07-17 11:42:24,137] INFO in preprocessing_util: Labels will be extracted from the document server if the field 'label' is present. concept-graphs-api-1 | INFO:main:Reading config (embedding) ... concept-graphs-api-1 | [2024-07-17 11:42:24,137] INFO in main_methods: Reading config (embedding) ... concept-graphs-api-1 | INFO:main:No config file provided; using default values concept-graphs-api-1 | [2024-07-17 11:42:24,137] INFO in embedding_util: No config file provided; using default values concept-graphs-api-1 | INFO:main:Parsed the following arguments for <embedding_util.PhraseEmbeddingUtil object at 0x7f23790c3430>: concept-graphs-api-1 | {'model': 'Sahajtomar/German-semantic', 'down_scale_algorithm': None, 'corpus_name': 'test_data_source_3'} concept-graphs-api-1 | [2024-07-17 11:42:24,137] INFO in main_methods: Parsed the following arguments for <embedding_util.PhraseEmbeddingUtil object at 0x7f23790c3430>: concept-graphs-api-1 | {'model': 'Sahajtomar/German-semantic', 'down_scale_algorithm': None, 'corpus_name': 'test_data_source_3'} concept-graphs-api-1 | INFO:main:Reading config (clustering) ... concept-graphs-api-1 | [2024-07-17 11:42:24,138] INFO in main_methods: Reading config (clustering) ... concept-graphs-api-1 | INFO:main:No config file provided; using default values concept-graphs-api-1 | [2024-07-17 11:42:24,138] INFO in clustering_util: No config file provided; using default values concept-graphs-api-1 | INFO:main:Parsed the following arguments for <clustering_util.ClusteringUtil object at 0x7f23790c3310>: concept-graphs-api-1 | {'algorithm': 'kmeans', 'downscale': 'umap', 'scaling_n_neighbors': 10, 'scaling_min_dist': 0.1, 'scaling_n_components': 100, 'scaling_metric': 'euclidean', 'scaling_random_state': 42, 'kelbow_k': (10, 100), 'kelbow_show': False, 'corpus_name': 'test_data_source_3'} concept-graphs-api-1 | [2024-07-17 11:42:24,138] INFO in main_methods: Parsed the following arguments for <clustering_util.ClusteringUtil object at 0x7f23790c3310>: concept-graphs-api-1 | {'algorithm': 'kmeans', 'downscale': 'umap', 'scaling_n_neighbors': 10, 'scaling_min_dist': 0.1, 'scaling_n_components': 100, 'scaling_metric': 'euclidean', 'scaling_random_state': 42, 'kelbow_k': (10, 100), 'kelbow_show': False, 'corpus_name': 'test_data_source_3'} concept-graphs-api-1 | INFO:main:Reading config (graph) ... concept-graphs-api-1 | [2024-07-17 11:42:24,139] INFO in main_methods: Reading config (graph) ... concept-graphs-api-1 | INFO:main:No config file provided; using default values concept-graphs-api-1 | [2024-07-17 11:42:24,140] INFO in graph_creation_util: No config file provided; using default values concept-graphs-api-1 | INFO:main:Parsed the following arguments for <graph_creation_util.GraphCreationUtil object at 0x7f23790c3cd0>: concept-graphs-api-1 | {'cluster_distance': 0.7, 'cluster_min_size': 4, 'graph_cosine_weight': 0.6, 'graph_merge_threshold': 0.95, 'graph_weight_cut_off': 0.5, 'graph_unroll': False, 'graph_simplify': 0.5, 'graph_simplify_alg': 'significance', 'graph_sub_clustering': False, 'restrict_to_cluster': True, 'corpus_name': 'test_data_source_3'} concept-graphs-api-1 | [2024-07-17 11:42:24,140] INFO in main_methods: Parsed the following arguments for <graph_creation_util.GraphCreationUtil object at 0x7f23790c3cd0>: concept-graphs-api-1 | {'cluster_distance': 0.7, 'cluster_min_size': 4, 'graph_cosine_weight': 0.6, 'graph_merge_threshold': 0.95, 'graph_weight_cut_off': 0.5, 'graph_unroll': False, 'graph_simplify': 0.5, 'graph_simplify_alg': 'significance', 'graph_sub_clustering': False, 'restrict_to_cluster': True, 'corpus_name': 'test_data_source_3'} concept-graphs-api-1 | /usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. concept-graphs-api-1 | _torch_pytree._register_pytree_node( concept-graphs-api-1 | WARNING:root:There are trigger types in the termset that are not expected by negspacy and won't be processed: {'none', 'preceding_speculation', 'following_speculation'} concept-graphs-api-1 | [2024-07-17 11:42:27,426] WARNING in negation: There are trigger types in the termset that are not expected by negspacy and won't be processed: {'none', 'preceding_speculation', 'following_speculation'} 100%|██████████| 2938/2938 [04:43<00:00, 10.38it/s] concept-graphs-api-1 | INFO:root:Creating Sentence Embedding with 'None' concept-graphs-api-1 | [2024-07-17 11:47:15,844] INFO in embedding_functions: Creating Sentence Embedding with 'None' concept-graphs-api-1 | INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: Sahajtomar/German-semantic
concept-graphs-api-1 | [2024-07-17 11:47:15,844] INFO in SentenceTransformer: Load pretrained SentenceTransformer: Sahajtomar/German-semanti concept-graphs-api-1 | INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu concept-graphs-api-1 | [2024-07-17 11:47:54,135] INFO in SentenceTransformer: Use pytorch device_name: cpu concept-graphs-api-1 | Saved under: /rest_api/tmp/test_data_source_3/test_data_source_3_data.pickle Batches: 100%|██████████| 187/187 [02:50<00:00, 1.10it/s] concept-graphs-api-1 | INFO:root:Building Concept Cluster ... concept-graphs-api-1 | [2024-07-17 11:50:54,372] INFO in cluster_functions: Building Concept Cluster ... concept-graphs-api-1 | INFO:root:UMAP arguments: {'a': None, 'angular_rp_forest': False, 'b': None, 'dens_frac': 0.3, 'dens_lambda': 2.0, 'dens_var_shift': 0.1, 'densmap': False, 'disconnection_distance': None, 'force_approximation_algorithm': False, 'init': 'spectral', 'learning_rate': 1.0, 'local_connectivity': 1.0, 'low_memory': True, 'metric': 'euclidean', 'metric_kwds': None, 'min_dist': 0.1, 'n_components': 100, 'n_epochs': None, 'n_jobs': -1, 'n_neighbors': 10, 'negative_sample_rate': 5, 'output_dens': False, 'output_metric': 'euclidean', 'output_metric_kwds': None, 'precomputed_knn': (None, None, None), 'random_state': 42, 'repulsion_strength': 1.0, 'set_op_mix_ratio': 1.0, 'spread': 1.0, 'target_metric': 'categorical', 'target_metric_kwds': None, 'target_n_neighbors': -1, 'target_weight': 0.5, 'tqdm_kwds': None, 'transform_mode': 'embedding', 'transform_queue_size': 4.0, 'transform_seed': 42, 'unique': False, 'verbose': False} concept-graphs-api-1 | [2024-07-17 11:50:54,386] INFO in cluster_functions: UMAP arguments: {'a': None, 'angular_rp_forest': False, 'b': None, 'dens_frac': 0.3, 'dens_lambda': 2.0, 'dens_var_shift': 0.1, 'densmap': False, 'disconnection_distance': None, 'force_approximation_algorithm': False, 'init': 'spectral', 'learning_rate': 1.0, 'local_connectivity': 1.0, 'low_memory': True, 'metric': 'euclidean', 'metric_kwds': None, 'min_dist': 0.1, 'n_components': 100, 'n_epochs': None, 'n_jobs': -1, 'n_neighbors': 10, 'negative_sample_rate': 5, 'output_dens': False, 'output_metric': 'euclidean', 'output_metric_kwds': None, 'precomputed_knn': (None, None, None), 'random_state': 42, 'repulsion_strength': 1.0, 'set_op_mix_ratio': 1.0, 'spread': 1.0, 'target_metric': 'categorical', 'target_metric_kwds': None, 'target_n_neighbors': -1, 'target_weight': 0.5, 'tqdm_kwds': None, 'transform_mode': 'embedding', 'transform_queue_size': 4.0, 'transform_seed': 42, 'unique': False, 'verbose': False} concept-graphs-api-1 | [2024-07-17 11:51:34,328] INFO in cluster_functions: -- Calculating K-Elbow ... concept-graphs-api-1 | INFO:root:---- shape of embeddings: ((5965, 100)) concept-graphs-api-1 | [2024-07-17 11:51:34,328] INFO in cluster_functions: ---- shape of embeddings: ((5965, 100)) concept-graphs-api-1 | INFO:root:---- Arguments: {'k': (10, 100), 'show': False} concept-graphs-api-1 | [2024-07-17 11:51:34,329] INFO in cluster_functions: ---- Arguments: {'k': (10, 100), 'show': False} concept-graphs-api-1 | INFO:root:-- Clustering ... concept-graphs-api-1 | [2024-07-17 11:51:56,398] INFO in cluster_functions: -- Clustering ... concept-graphs-api-1 | INFO:root: (kmeans) with Arguments: {} concept-graphs-api-1 | Number of Clusters: 15 concept-graphs-api-1 | [2024-07-17 11:51:56,398] INFO in cluster_functions: (kmeans) with Arguments: {} concept-graphs-api-1 | Number of Clusters: 15 concept-graphs-api-1 | /usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py:543: UserWarning: The parameter 'stop_words' will not be used since 'analyzer' != 'word' concept-graphs-api-1 | warnings.warn( concept-graphs-api-1 | INFO:root:Building Document Concept Matrix with following arguments: concept-graphs-api-1 | {'cluster_distance': 0.7, 'cluster_min_size': 4, 'cluster_exclusion_ids': None, 'graph_unroll': False, 'graph_simplify': 0.5, 'graph_simplify_alg': 'significance', 'graph_sub_clustering': False, 'graph_distance_cutoff': 0.5, 'connection_distance': 2, 'restrict_to_cluster': True, 'filter_min_df': 1, 'filter_max_df': 1.0, 'filter_stop': [], 'break_after_graph_creation': True, 'graph_cosine_weight': 0.6, 'graph_merge_threshold': 0.95, 'graph_weight_cut_off': 0.5, 'self': <cluster_functions.WordEmbeddingClustering._ConceptGraphClustering object at 0x7f2346ad4fd0>} concept-graphs-api-1 | [2024-07-17 11:52:15,464] INFO in cluster_functions: Building Document Concept Matrix with following arguments: concept-graphs-api-1 | {'cluster_distance': 0.7, 'cluster_min_size': 4, 'cluster_exclusion_ids': None, 'graph_unroll': False, 'graph_simplify': 0.5, 'graph_simplify_alg': 'significance', 'graph_sub_clustering': False, 'graph_distance_cutoff': 0.5, 'connection_distance': 2, 'restrict_to_cluster': True, 'filter_min_df': 1, 'filter_max_df': 1.0, 'filter_stop': [], 'break_after_graph_creation': True, 'graph_cosine_weight': 0.6, 'graph_merge_threshold': 0.95, 'graph_weight_cut_off': 0.5, 'self': <cluster_functions.WordEmbeddingClustering._ConceptGraphClustering object at 0x7f2346ad4fd0>} concept-graphs-api-1 | INFO:root:Building Concept Graphs... (exclusion_ids: []) concept-graphs-api-1 | [2024-07-17 11:52:15,465] INFO in cluster_functions: Building Concept Graphs... (exclusion_ids: []) concept-graphs-api-1 | INFO:root:Filtering phrases concept-graphs-api-1 | [2024-07-17 11:52:15,465] INFO in cluster_functions: Filtering phrases concept-graphs-api-1 | Saved under: /rest_api/tmp/test_data_source_3/test_data_source_3_embedding.pickle concept-graphs-api-1 | Saved under: /rest_api/tmp/test_data_source_3/test_data_source_3_clustering.pickle 53%|█████▎ | 8/15 [00:00<00:00, 23.32it/s] concept-graphs-api-1 | INFO:root:Cutting edges (significance)... concept-graphs-api-1 | [2024-07-17 11:52:17,016] INFO in cluster_functions: Cutting edges (significance)... 100%|██████████| 8/8 [00:00<00:00, 12.48it/s]

I don't really care about whether it succeeds or fails. It just doesn't seem right if the pipeline is declared as "successful" in the first response, but actually failed or is still running.

ChristophB commented 2 months ago

It looks like the concept graph Docker container is eating up my memory and is dying.