glato / emerge

Emerge is a browser-based interactive codebase and dependency visualization tool for many different programming languages. It supports some basic code quality and graph metrics and provides a simple and intuitive way to explore and analyze a codebase by using graph structures.
MIT License
783 stars 46 forks source link

.tsx support #11

Open gustaff-weldon opened 2 years ago

gustaff-weldon commented 2 years ago

Is your feature request related to a problem? Please describe.

I'm trying to run emerge against a big React app codebase. python emerge.py -c frontend.yml

---
project_name: frontend
loglevel: info
analyses:
- analysis_name: check_tsx_files
  source_directory: <some-project-path>/frontend/src
  only_permit_languages:
  - javascript
  only_permit_file_extensions:
  - .tsx
  - .ts
  ignore_dependencies_containing:
  - node_modules
  ignore_files_containing:
  - spec.ts
  - story.ts
  - mock.ts
  - .snap
  - .gql
  file_scan:
  - number_of_methods
  - source_lines_of_code
  - dependency_graph
  - louvain_modularity
  - fan_in_out
  export:
  - directory: <some-project-path>/frontend
  - graphml
  - dot
  - json
  - tabular_file
  - tabular_console_overall
  - d3

I'm getting tons and tons of messages:

.tsx is an unknown extension, ignoring `/<edited>/src/Issues/Issues/Issues.tsx`

Pretty much the whole app gets skipped and not mapped.

Describe the solution you'd like I'd like to be able to analyze a codebase that contains tsx which are essentially jsx with Typescript.

Given the size of our app, the ability to visualise it and check in an interactive way, would be an immense help.

glato commented 2 years ago

@gustaff-weldon sounds like an interesting enhancement. Could you point me maybe to some example project on GitHub that has a similar mixture of tsx/jsx files, which I could use for an analysis? Then I'll see if I can do something about this on the following weekend šŸ˜‰.

gustaff-weldon commented 2 years ago

@glato that would be awesome!

I have found a couple of projects that might be handy:

Typescript only: tsx, ts https://github.com/cypress-io/cypress-realworld-app https://github.com/HospitalRun/hospitalrun-frontend https://github.com/excalidraw/excalidraw/

More known, mixed language types: https://github.com/grafana/grafana/tree/main/packages https://github.com/getsentry/sentry

Javascript js, jsx:

Of course, I'm happy to help with testing on our codebase, we have a fairly big app (~5000 ts/tsx files), I cannot share as it's a private repo.

Let me know if that's enough :)

glato commented 2 years ago

@gustaff-weldon I've pushed an update in the current version 0.19.1 to be ready for parsing .tsx and .jsx files also. I've tested this on https://github.com/cypress-io/cypress-realworld-app and https://github.com/oldboyxx/jira_clone. This already looked promising. I setup the following config as an example to be able to include both filetypes:

---
project_name: tsx example
loglevel: info
analyses:
- analysis_name: simple check
  source_directory: <some-project-path>/github/cypress-realworld-app
  only_permit_languages:
  - javascript
  - typescript
  only_permit_file_extensions:
  - .js
  - .jsx
  - .ts
  - .tsx
  file_scan:
  - number_of_methods
  - source_lines_of_code
  - dependency_graph
  - fan_in_out
  - louvain_modularity
  export:
  - directory: <some-export-path>/tmp/emerge/export/tsx
  - graphml
  - dot
  - json
  - tabular_file
  - tabular_console_overall
  - d3

I'd appreciate any feedback if this helps and solves the issue? Or if there are some details which does not work out as expected?

gustaff-weldon commented 2 years ago

Hi @glato,

First of all, thank you so mych for your effort, much appreciated. I will try to help with testing.

For starters, I have run this against our codebase, and while it worked (ie. looks better than before :)) I'm a little bit suspicious about the number of files traversed.

Are you following imports or just going through all the matching files in the directories?

The reason I'm asking is because we are using alias in the import, think of @myco/utils/date to avoid ../../../../../utils/date We are usually providing those aliases in tsconfig and webpack config files. This is quite a common practice in big JS projects.

Alias-related part of config for Typescript:

  "compilerOptions": {
    "paths": {
      "@myco/*": [
        "./src/*"
      ]
    },

Alias-related part of config for Webpack:

  resolve: {
    alias: {
      '@myco': resolve('src')
    },

If you do not resolve aliases, we might need to add this. I'm happy to provide those as a part of the yml config, or if you prefer, you can parse provided tsconfig.json file.

I will get back to you as soon as I do more testing. But I wanted to clarify this alias thing.

glato commented 2 years ago

@gustaff-weldon Hi there, emerge currently tries to do a hybrid approach here: First it traverses recursively through all files from a starting directory and collecting every import statement within each file that it can find. For any recognized import statement, it tries to 'resolve' it to a correct POSIX-kind of filesystem path (e.g. an import of the kind 'foo/bar/../baz.ts' should be resolved to 'foo/baz.ts'). import aliases of the kind @myco/utils/date are currently not part of the resolver. If you say that this kind of import aliases are often used in js/ts project, this could indeed be an useful update. I'll see if I can find some further examples to analyze and test such import aliases and see about an update on this issue.

gustaff-weldon commented 2 years ago

@glato using aliases is quite a common thing in bigger JS projects. I will see if I can find some examples to help with testing

glato commented 2 years ago

@gustaff-weldon I've just pushed a small update in 0.19.2 that might solve your problems:

There is (currently) an undocumented yaml config section replace_dependency_substrings where you could easily define dependency name substring replacements like the following:

replace_dependency_substrings:
- "@foo": src/foo
- "@bar": src/bar

so that import aliases/prefixes like @foo would be replaced by src/foo in any dependency. The only restriction here is that the import alias keyword is wrapped in "..." since this seems to be a YAML/PyYAML restriction also for some characters (including @).

This means that an example TS-configuration might look like this:

---
project_name: tsx example
loglevel: info
analyses:
- analysis_name: simple check
  source_directory: /Users/user/github/cypress-realworld-app
  only_permit_languages:
  - javascript
  - typescript
  only_permit_file_extensions:
  - .js
  - .jsx
  - .ts
  - .tsx
  replace_dependency_substrings:
  - "@foo": src/foo
  - "@bar": src/bar
  file_scan:
  - number_of_methods
  - source_lines_of_code
  - dependency_graph
  - fan_in_out
  - louvain_modularity
  export:
  - directory: /Users/user/tmp/emerge/export/cypress-realworld-app
  - graphml
  - dot
  - json
  - tabular_file
  - tabular_console_overall
  - d3

I would appreciate any feedback if this might solve your problem (by defining a custom configuration in replace_dependency_substrings), then I would consider to also implement this for the other parsers.

glato commented 2 years ago

@gustaff-weldon I've release this import alias feature with version 1.0.0. You can also install this tool very quickly with pip now (pip install emerge-viz). Will be closing this issue in a couple of days if this solves the problem.

gustaff-weldon commented 2 years ago

@glato thanks, I will give it a go. I assume I should use the replace_dependency_substrings: workaround you have suggested. Edit: just noticed in new README import_aliases has been added

gustaff-weldon commented 2 years ago

@glato I wonder if this is how the scan should behave given the following settings:

---
project_name: frontend
loglevel: debug
analyses:
- analysis_name: check_tsx_files
  source_directory: /Users/myuser/frontend/src
  only_permit_languages:
  - javascript
  - typescript
  only_permit_file_extensions:
  - .tsx
  - .ts
  - .js
  import_aliases:
  - "@zen/": src/
  ignore_dependencies_containing:
  - node_modules
  ignore_files_containing:
  - .spec.ts
  - .spec.tsx
  - .story.ts
  - .story.tsx
  - .mock.ts
  - .mock.tsx
  - .snap
  - .gql
  - .generated.ts
  file_scan:
  - number_of_methods
  - source_lines_of_code
  - dependency_graph
  - fan_in_out
  - louvain_modularity
  export:
  - directory: ./report/
  - graphml
  - dot
  - json
  - tabular_file
  - tabular_console_overall
  - d3

Notice that I exclude all spec files, yet I see in console:

2021-12-06 23:56:20     parser D ā© extracting imports from base result SettingsRepository.spec.ts...
2021-12-06 23:59:44     parser D ā© extracting imports from base result CaptureProfileRolesPage.spec.tsx...

I would expect those to be excluded? It seems to be a problem with double extension ones, as eg. .css works fine:

2021-12-07 00:09:40   analysis I ā© .css is not allowed in the scan, ignoring /Users/myuser//frontend/src/App/styles/fonts.css
gustaff-weldon commented 2 years ago

1) Also, even without imports, I'm not sure if imports resolution actually works. I have run some tests on a very limited part of our codebase and I'm getting some dubious results.

Eg. given such simple folder:

.
ā”œā”€ā”€ OperationsSettingsSearch.spec.tsx
ā”œā”€ā”€ OperationsSettingsSearch.tsx
ā””ā”€ā”€ index.ts

where index.ts looks like this:

export { default } from './OperationsSettingsSearch';

The resulting graph does not have a node index.ts which depends on OperationsSettingsSearch. index.ts is shown with no dependencies.

2) I have a feeling the way aliases are done do not solve the problem. For example:

2021-12-07 00:30:30     parser D ā© adding import: react to OperationsSettingsSearch/OperationsSettingsSearch.tsx
2021-12-07 00:30:30     parser I ā© renamed dependency: @zen/Components/Search -> src/Components/Search

given the files structure:

src_min
ā””ā”€ā”€ Components/Search
    ā”œā”€ā”€ Search.spec.tsx
    ā”œā”€ā”€ Search.tsx
    ā”œā”€ā”€ __snapshots__
    ā”‚Ā Ā  ā””ā”€ā”€ Search.spec.tsx.snap
    ā””ā”€ā”€ index.ts
    OperationsSettings/OperationsSettingsSearch
    ā”œā”€ā”€ OperationsSettingsSearch.spec.tsx
    ā”œā”€ā”€ OperationsSettingsSearch.tsx
    ā””ā”€ā”€ index.ts

with alias @zen: src and

src_min/OperationsSettings/OperationsSettingsSearch/OperationsSettingsSearch.tsx relying on src_min/Components/Search.tsx like this:

import Search from '@zen/Components/Search';

and src_min/Components/Search/index.ts relying on the same src/Components/Search.tsx like this:

import Search from './Search';

I'm getting two separate dependency chains.

Screenshot 2021-12-07 at 00 50 46

I feel like, the tool does not include JS/TS import resolution logic, as technically those two dependencies below, are the same:

Screenshot 2021-12-07 at 00 52 22 Screenshot 2021-12-07 at 00 52 16

I'm not sure what would be the best way, to share with you, even that limited example. I'm happy to connect on Discord eg.

rafaelrozon commented 5 months ago

Hello, any updates on this? I'm also trying to run emerge on a large Typescript+React codebase and I'm getting a bunch of

NotImplementedError: currently not implemented in TYPESCRIPT_PARSER

Thanks so much for this project. I think it will be very helpful to an initiative I have at work.