Closed SeverusYixin closed 1 month ago
Hi @haesleinhuepf, would you mind helping me review these codes?
Hi @SeverusYixin ,
I feel not qualified for reviewing the .js files.
Just two general suggestions:
- Write a comment here and there, e.g. at the very beginning of index_data.py explaining what the file does, or what a longer code block does.
- Consider splitting code into functions in case it does multiple things. index_data.py looks a bit like spaghetti code.
Out of curiousity I asked claude to optimize the code and make it less spaghetti-like and this is what it came up with:
import json import yaml import logging from elasticsearch import Elasticsearch import os # Set up logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # Configuration ES_HOST = 'localhost' ES_PORT = 9200 ES_SCHEME = 'http' ES_AUTH = ('admin', 'admin123') ES_INDEX = 'bioimage-training' BASE_PATH = os.path.join(os.path.dirname(__file__), '..', '..', 'resources') YAML_FILES = [ 'blog_posts.yml', 'events.yml', 'materials.yml', 'nfdi4bioimage.yml', 'papers.yml', 'workflow-tools.yml', 'youtube_channels.yml' ] def connect_to_elasticsearch(): """Establish connection to Elasticsearch.""" return Elasticsearch([{'host': ES_HOST, 'port': ES_PORT, 'scheme': ES_SCHEME}], basic_auth=ES_AUTH) def read_yaml_file(file_path): """Read and parse YAML file.""" try: with open(file_path, 'r') as file: return yaml.safe_load(file) except FileNotFoundError: logger.error(f"File not found: {file_path}") except yaml.YAMLError: logger.error(f"Error reading YAML file: {file_path}") return None def index_data(es, data): """Index data into Elasticsearch.""" if not isinstance(data, list): logger.error(f"Data is not a list: {data}") return for item in data: if not isinstance(item, dict): logger.error(f"Item is not a dictionary: {item}") continue try: es.index(index=ES_INDEX, body=item) logger.info(f"Indexed item: {item}") except Exception as e: logger.error(f"Error indexing item: {item} - {e}") def main(): es = connect_to_elasticsearch() for file_name in YAML_FILES: file_path = os.path.join(BASE_PATH, file_name) logger.info(f"Processing file: {file_path}") content = read_yaml_file(file_path) if content is None: continue data = content.get('resources', []) logger.info(f"Data read from file: {data}") index_data(es, data) logger.info("Data indexing complete.") if __name__ == "__main__": main()
I'm not proposing to use this code and I haven't tested it. I just presume that mid-/long-term such code is easier to maintain if it is written in small, well documented, reusable functions.
Best, Robert
That's enough, it will help me standardize my code formatting a bit, pretty thank you :)
The connection between the search engine and the "ymal" database has been initially implemented in this version.